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Abstract 


Ladle  is  a  language  for  specifying  the  structure  of  certain  kinds  of  formal  languages.  The  name 
stands  for  LAnguage  Description  LanguagE. 

A  Ladle  specification  defines  two  structural  aspects  of  language  representation:  lexical  and 
syntactic.  (A  semantic  specification  will  be  added  in  a  future  release.)  The  syntax  description 
encompasses  the  abstract  syntax  of  the  language,  the  internal  tree  representation  of  this  abstract 
syntax,  and  how  to  parse  and  unparse  such  syntax  trees. 

The  Ladle  processor  transforms  a  language  specification  into  a  set  of  tables  that  are  used 
by  the  interactive  language-based  editor  Pan  I  to  map  between  text  and  abstract  syntax  trees, 
using  either  bottom-up  parsing  or  structural  elaboration.  Access  to  the  tables  is  provided  by  a 
client  interface  for  Ladle. 

The  report  first  gives  some  background  information  and  discusses  the  functionality  of  the 
Ladle  processor  at  a  fairly  high  level.  The  theoretical  basis  for  Ladle  is  described.  Subsequent 
sections  specify  Ladle’s  input  format  and  semantics,  its  output  data  and  format,  and  the  client 
interface  to  Ladle’s  output  tables. 
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0  Introduction 


Ladle  is  a  language  for  specifying  tlie  structure  of  languages.  The  name  stands  for  LAnguage 
Description  LanguagE.  A  Ladle  specification  defines  two  structural  aspects  of  language  represen¬ 
tation:  lexical  and  syntactic.2  The  Ladle  processor  transforms  a  language  specification  into  a 
set  of  tables  which  contain  enough  information  to  represent  instances  of  the  language  as  text,  as 
sequences  of  lexical  objects  such  as  identifiers,  integers,  etc.,  or  as  syntax  trees,  and  to  convert 
between  these  representations.  The  tables  also  permit  direct  construction  and  manipulation  of 
syntax  trees  representing  language  instances.  Access  to  the  tables  is  provided  by  a  client  interface 
for  Ladle. 

Section  1  gives  some  background  information  and  discusses  the  functionality  of  the  Ladle  processor 
at  a  fairly  high  level.  Section  2  contains  the  theoretical  basis  for  Ladle.  Section  3  specifies  Ladle’s 
input  format  and  semantics.  Section  4  details  Ladle’s  output  data  and  format.  Section  5  describes 
the  client  interface  to  Ladle’s  output  tables.  There  are  also  appendices  which  contain  important 
notational  conventions,  examples,  and  some  auxiliary  information.  In  particular,  Appendix  A 
describes  many  of  the  mathematical  notations  and  conventions  used  throughout  the  document. 
The  casual  reader  may  wish  to  skim  Section  2  or  skip  Sections  4  and  5  entirely. 


1  Overview 


The  core  of  a  Ladle  language  specification  is  the  description  of  the  syntactic  structure  of  the 
language.  This  description  specifies  the  abstract  syntax  of  the  language,  the  internal  tree  repre¬ 
sentation  of  this  abstract  syntax,  and  how  to  parse  and  unparse  such  syntax  trees.  While  these 
aspects  are  closely  tied  together,  each  has  a  certain  amount  of  flexibility  independent  of  the  others. 
The  remainder  of  this  section  elaborates  each  of  these  aspects  of  syntax. 


1.1  Abstract  Syntax 

The  abstract  syntax  of  a  language  is  a  description  of  the  complete  syntactic  structure  of  the  language 
as  it  is  understood  by  the  language’s  users.  For  example,  an  abstract  syntax  for  a  programming 
language  must  contain  a  structure  for  each  kind  of  statement  and  expression  in  the  language,  and 
must  include  all  of  the  language’s  keywords,  such  as  BEGIN  and  End.  Note  that  this  definition 
contrasts  with  some  other  usages  of  the  term  “abstract  syntax”,  where  keywords,  parentheses,  and 
other  purely  syntactic  symbols  are  omitted. 

In  Ladle,  the  abstract  syntax  of  a  language  is  defined  by  an  extended  context  free  grammar  for 
the  language,  called  the  abstract  grammar.  The  abstract  grammar  need  not  be  in  any  particular 
grammatical  class  such  as  LL  or  LALR.  It  may  even  be  ambiguous.  The  abstract  grammar  will 
typically  contain  exactly  one  rule  for  each  construct  in  the  language,  e.g.  a  declaration  list,  a 

2A  semantic  specification  will  be  added  in  a  future  release. 
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conditional  statement,  an  addition  expression.  However,  almost  any  correct  grammar  can  be  used 
if  the  precise  derivation  of  an  instance  of  the  language  is  not  of  interest.  Note  that  since  no 
restrictions  are  placed  on  abstract  grammars,  it  is  unnecessary  to  transform  a  desired  grammar 
into  one  that  will  work  for  Ladle. 

In  Ladle,  the  structure  of  an  instance  of  a  language  is  defined  by  an  abstract  grammar  derivation 
that  rewrites  the  start  symbol  as  that  instance.  An  abstract  syntax  tree  (or  just  syntax  tree ) 
represents  an  abstract  grammar  derivation  that  rewrites  a  single  non- terminal.  A  derivation  may 
rewrite  a  non-terminal  besides  the  initial  symbol  of  the  grammar  or  may  result  in  a  phrasal  form 
that  is  not  a  terminal  string.  The  rhs  of  such  a  derivation  is  not  an  instance  of  the  language,  but 
a  syntax  tree  representing  that  derivation  is  still  valid.  Thus  a  syntax  tree  does  not  necessarily 
represent  an  instance  of  a  language,  but  may  represent  a  structured  instance  fragment. 


1.2  Syntax  Tree  Internal  Representation 


A  syntax  tree  represents  the  structure  of  a  phrasal  form  of  a  language  as  an  abstract  derivation. 
The  immediate  sub-tree  of  a  node  in  a  syntax  tree  is  the  sub-tree  consisting  of  that  node  and  its 
children.  Conceptually,  the  correspondence  between  a  syntax  tree  and  a  derivation  is  that  each 
immediate  sub- tree  corresponds  to  the  application  of  one  rewrite  rule.  Thus  the  internal  nodes  of  a 
syntax  tree  represent  abstract  non-terminals  and  the  leaves  represent  abstract  terminals  and  non¬ 
terminals.  An  internal  node  can  also  be  considered  to  represent  the  rewrite  rule  represented  by  that 
node’s  immediate  sub-tree.  (Some  leaf  nodes  can  be  similarly  considered  to  represent  empty  rules, 
and  therefore  e  as  well.)  Note  that  the  root  of  a  syntax  tree  represents  the  Ihs  of  the  derivation 
represented  by  the  tree,  and  the  frontier  of  the  tree  represents  the  rhs  of  the  derivation. 

The  internal  representation  (IR)  of  a  syntax  tree  may  be  more  compact  than  an  exact  representation 
of  the  syntax  tree.  An  important  technique  for  reducing  tree  size  is  to  have  each  internal  tree  node 
designate  the  rewrite  rule  represented  by  the  node’s  immediate  sub- tree  rather  than  that  rule  s 
non- terminal  Ihs ,  since  the  Uis  is  easily  computed  from  the  rule.  (Again,  a  leaf  node  may  designate 
an  empty  rule.)  There  are  then  two  methods  that  can  be  used  to  compact  an  IR  tree.  One  makes 
the  tree  less  broad  by  eliminating  terminal  leaf  nodes  and  the  other  makes  the  tree  less  deep  by 
eliminating  rule  nodes. 

An  IR  tree  need  not  represent  every  terminal  in  a  rewrite  rule’s  rhs  explicitly.  The  tree  node 
representing  a  rule  may  have  child  sub-trees  representing  derivations  that  rewrite  the  symbols  in 
the  rhs  of  the  rule.  The  node  must  have  a  child  node  for  each  non-terminal  symbol  in  the  rhs. 
However,  terminals  may  be  represented  implicitly,  so  long  as  the  terminal  in  a  particular  position 
in  a  given  rule  is  always  represented  the  same  way.  In  a  given  language,  usually  a  terminal  will 
always  be  represented  explicitly  or  always  be  represented  implicitly,  but  it  is  possible  to  specify  the 
representation  on  a  rule  by  rule  basis.  Figure  1  shows  the  difference  between  explicit  and  implicit 
terminal  representation  of  the  rule  (stmt  — ►  IF  expr  THEN  stmt). 

An  IR  tree  also  need  not  represent  every  rewrite  rule  explicitly.  A  rule  can  be  represented  in  any 
of  three  ways.  Consider  the  rule  (expr  identifier ),  the  possible  tree  representations  of  which 
are  shown  in  Figure  2.  Tree  (a)  is  the  strict  representation,  while  trees  (b)  and  (c)  are  smaller, 
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^  ( stmt  — »  IF  expr  THEN  stmt)j 


( stmt  — >  IF  expr  THEN  stmt )  J 


THEN 


Explicit  Terminals  Implicit  Terminals 

Figure  1:  Two  IR  trees  for  the  rule  ( stmt  —>  IF  expr  THEN  stmt). 


(a)  (b)  (c) 

Figure  2:  Three  IR  trees  for  the  single  rule  derivation  ( expr  — *■  identifier). 


easier  to  use  representations.  Any  rule  p  whose  rhs  is  not  empty  and  contains  no  more  than  one 
non-terminal  need  not  be  represented  by  a  tree  node.  Instead,  p  may  be  represented  by  the  node 
corresponding  to  the  non-terminal  in  rfis(p),  or  by  the  node  corresponding  to  a  specified  explicitly 
represented  terminal  in  the  rhs  if  there  is  no  non-terminal.  As  a  representation  of  p ,  this  node 
may  include  the  rule  as  an  annotation  or  it  may  not,  depending  on  the  precise  IR  specified.  In 
Figure  2,  tree  (b)  includes  the  annotation,  but  tree  (c)  does  not.  The  annotation  must  be  included 
when  rhs(p)  contains  multiple  symbols.  Any  number  of  annotations  may  be  added  to  a  node,  in 
derivation  order.  Thus,  a  rule  may  be  represented  by  a  node,  by  an  annotation  on  a  node,  or,  if  it 
is  a  chain  rule,  by  nothing  at  all. 


1.3  Parsing  and  Unparsing  Syntax  Trees 

The  abstract  grammar  used  to  define  the  syntax  trees  for  a  language  should  be  clean  and  simple. 
However,  there  is  no  guarantee  that  it  can  be  parsed  easily,  or  even  unambiguously.  For  this 
reason,  Ladle  language  descriptions  contain  not  just  the  abstract  grammar,  but  a  second  context 
free  grammar  as  well,  called  the  concrete  grammar. 

The  concrete  grammar  specifies  how  to  convert  an  abstract  phrasal  form  into  an  abstract  syntax 
tree,  and  vice  versa.  The  former  operation  is  called  parsing ,  and  the  latter  unparsing.  The  concrete 
grammar  must  be  LALR(l).  The  language  specified  by  the  concrete  grammar  must  be  the  same 
one  specified  by  the  abstract  grammar.  In  fact,  the  concrete  grammar  must  be  an  expansion  of  the 
abstract  grammar,  a  concept  defined  in  Section  2.  The  parsing  and  unparsing  operations  are  not 
specified  explicitly,  but  are  implicit  in  the  relationship  between  the  abstract  and  concrete  grammars. 
Typically,  the  concrete  grammar  will  be  similar  to  the  abstract  one,  but  modified  to  include  operator 
precedence,  to  have  good  error  recovery  properties,  and  to  be  LALR(l).  However,  the  concrete 
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grammar  can  be  any  expansion  of  the  abstract  grammar,  including  the  abstract  grammar  itself,  if 
it  is  LALR(l). 


2  Theory 


The  relationship  between  the  abstract  and  concrete  grammars  in  a  Ladle  language  description 
specifies  how  to  parse  an  abstract  phrasal  form  into  an  abstract  derivation,  and  also  how  to  unparse 
an  abstract  derivation  into  an  abstract  phrasal  form.  This  section  defines  the  precise  grammar 
relationship  required,  specifies  the  conversions,  and  provides  algorithms  for  them.  The  approach 
is  to  parse  with  the  concrete  grammar,  converting  derivations  and  symbols  between  the  grammars 
as  necessary.  (This  section  contains  a  great  deal  of  notation.  The  reader  may  wish  to  review 
Appendix  A  before  continuing.) 


2.1  Grammatical  Expansion 

A  context-free  grammar  Q  may  be  said  to  be  an  expansion  of  another  context-free  grammar  Q  with 
respect  to  a  mapping  V'o,  which  maps  terminals  and  non-terminals  of  Q  onto  terminals  and  non¬ 
terminals  of  Q ,  respectively,  and  a  set  Pcycuc  of  Q  rules,  each  of  which  has  no  semantic  consequence. 
With  respect  to  this  expansion,  the  attributes  (e.g.  symbols,  derivations,  etc.)  of  Q  are  referred  to 
as  abstract  attributes  and  the  attributes  of  Q  are  concrete.,  that  is,  Q  is  the  abstract  grammar  and 
Q  is  the  concrete  grammar.  The  conditions  that  define  expansion  ensure  that  the  languages  of  Q 
and  Q  are  identical,  modulo  the  renaming  of  the  mapping  ipo-  These  conditions  further  ensure  that 
a  concrete  derivation  can  easily  be  transformed  into  an  abstract  derivation,  and  vice  versa.  Since 
the  rules  of  Pcyclic  have  no  semantic  consequence,  these  rules  may  be  added  or  removed  from  an 
abstract  derivation  by  such  transformations. 

Informally,  to  determine  whether  a  concrete  grammar  Q  is  an  expansion  of  an  abstract  grammar  Q 
with  respect  to  a  mapping  tj^o  and  a  set  abstract  rules  Pcyc/,c,  perform  the  following  steps: 

•  The  domain  of  the  mapping  V’o  specifies  a  set  of  concrete  non-terminals  each  of  which  is 
equivalent  to  an  abstract  non-terminal.  Call  this  set  Abase- 

•  Extend  Abase  to  the  set  Acyciic  by  adding  those  concrete  non-terminals  that  are  cycle  equiva¬ 
lent  to  non- terminals  in  Abase-  The  cycle  equivalence  of  two  non- terminals  is  defined  below; 
informally,  each  of  the  concrete  non- terminals  derives  the  other  in  a  cycle  using  only  chain 
rules  and  concrete  derivations  that  correspond  to  abstract  rules  in  Pcyci%c-  It  is  for  this  reason 
that  the  set  Pcyciic  must  be  specified  as  part  of  a  grammatical  expansion. 

•  Restrict  Acycuc  to  the  set  A  by  removing  each  concrete  non-terminal  whose  only  purpose  is  to 
form  a  cycle,  that  is,  its  only  rewrite  ride  is  part  of  the  cycle  that  contains  the  non-terminal. 

•  Extend  A  to  the  set  Achain  by  adding  concrete  non- terminals  that  are  chain  equivalent  to 
those  in  A.  The  chain  equivalence  of  two  non-terminals  is  defined  below,  but  loosely  means 
that  the  non-terminal  not  in  A  chain  derives  or  is  chain  derived  by  the  non-terminal  that 
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is  in  A.  Each  non-terminal  in  Acyci%AA  is  chain- equivalent  to  some  non-terminal  in  A,  so 

Acycl  ic  Q  Ach  ain • 


•  Construct  II,  the  set  of  concrete  derivations  each  of  which  rewrites  an  abstracted  concrete  non¬ 
terminal  as  an  abstracted  phrasal  form  and  which  rewrites  exactly  one  abstracted  concrete 
non-terminal  (the  first).  A  phrasal  form  is  abstracted  when  it  contains  only  symbols  in 
(S  U  Achain),  and  a  concrete  non-terminal  is  abstracted  when  it  is  in  (E  U  *4c/»am)-  n  is  the  set 
of  concrete  derivations  that  rewrite  abstracted  concrete  phrasal  forms  as  abstracted  concrete 
sentential  forms  without  rewriting  any  intermediate  abstracted  concrete  phrasal  forms. 


Q  is  an  expansion  of  Q  when  the  concrete  derivations  of  II  correspond  precisely  to  the  abstract  rules 
of  Q,  except  for  the  concrete  derivations  that  correspond  to  null  abstract  derivations. 


Definition:  Grammatical  Expansion 

Let  Q  —  {N,  E,  P,  5)  and  Q  =  ( N ,  S,  P,  S)  be  the  abstract  and  concrete  context-free  grammars,  re¬ 
spectively. 

Let  Abase  C  IV  be  a  subset  of  concrete  non- terminals. 

Let -0o  :  (SuM^e)  —»  (EUlV)  be  an  invertible  mapping  such  that  rpo(S)  =  S  and  V’o (Lang(Q))  =  Lang(Q). 
Abate  =  '<P~l(N)  is  the  set  of  concrete  non-terminals  that  correspond  to  abstract  non-terminals. 

The  mapping  V’o  allows  the  renaming  of  symbols  between  the  two  grammars. 

Each  abstract  non-terminal  derives  a  set  of  syntactic  constructs  for  the  specified  language;  each 
concrete  non-terminal  A  in  Abate  derives  the  concrete  version  of  the  set  of  constructs  derived  by  the 
abstract  non-terminal  'tpa(  A).  (This  statement  is  deliberately  vague:  the  purpose  of  the  remainder 
of  this  subsection  is  to  couch  the  statement  precisely.) 

Let  Pcydic  be  any  set  of  abstract  rules  such  that  the  only  effect  of  applying  a  rule  p  in  Pcyc/,c  to  an 
abstract  phrasal  form  is  to  add  some  terminal  symbols.  These  rules  are  cyclic  and  are  chosen  to  de¬ 
scribe  syntactic  constructs  that  may  have  no  semantic  importance,  such  as  expression  parentheses. 

More  precisely,  each  rule  p  in  PcycUc  must  have  the  form  A  — »  £A(,,  where  A  G  N  and  f,  £  G  E*, 
although  Pcydic  need  not  contain  all  such  rules.  Removing  an  application  of  a  cyclic  abstract  rule 
from  an  abstract  derivation  always  yields  another  derivation. 


5 


Example: 


Q  =  (j\T,  E,P,  5),  where  N  =  {stmt,  expr},  E  =  {name,  EF, THEN, ELSE,  —  J}> 

the  rules  of  P  =  {£,  2, 3, 4, 5, 6, 7}  are 

T  stmt  — »  name  =  expr 

2  |  IF  expr  THEN  stmt 

3  |  IF  expr  THEN  stmt  ELSE  stmt 

4  expr  — *  (  expr  ) 

5  |  expr  +  expr 

6  |  expr  *  expr 

7  |  name 

and  5  =  stmt. 

Q  =  (IV,  E,  P,  5),  where  E  =  {name,  IF, THEN,  ELSE,  =  ,  +,  *,  (, )}, 

N  =  {stmt,  assignment,  ifstmt,  elseclause,  sup-expr,  expr,  term,  factor,  primary}, 
the  rules  in  P  =  {1,2,3,4,5,6,7,8,9,10,11,12,13,14}  are 

1  stmt  —*  assignment 

2  |  ifstmt 

3  assignment  — *  name  =  sup-expr 

4  ifstmt  -*  IF  sup-expr  THEN  stmt  elseclause 

5  elseclause  — » 

6  |  ELSE  stmt 

7  sup-expr  -+  expr 

8  expr  — ►  expr  +  term 

9  |  term 

10  term  — >  term  *  factor 

11  |  factor 

12  factor  — *  primary 

13  primary  — *  name 

14  |  (  expr  ) 


and  5  =  stmt. 


Aa«=  {stmt,  expr}. 

VX  €  (E  U  Abase),  MX)  =  X. 

P cyclic 

These  grammars  will  be  used  as  examples  throughout  the  paper. _ 

Two  concrete  non-terminals  are  cycle  equivalent  with  respect  to  the  abstract  grammar  when  each 
derives  the  other  using  a  cyclic  concrete  derivation,  and  some  uniqueness  constraints  are  satis¬ 
fied.  However,  cyclic  concrete  derivations  will  not  be  defined  until  Section  2.2.1,  since  the  defini¬ 
tion  depends  on  the  set  n,  which  cannot  be  defined  yet.  Therefore,  for  the  moment  a  concrete 
derivation  x  G  P*  is  said  to  be  cyclic  when  there  is  an  abstract  derivation  x  e  P*ychc  such  that 
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ip0(lhs(ir))  4-  xf}0(rhs(v)).  ( P *  is  defined  in  Appendix  A.) 

When  two  concrete  non- terminals  are  cycle  equivalent,  both  of  them  derive  the  same  set  of  syntactic 
constructs,  although  there  may  be  slight  differences  due  to  cyclic  concrete  derivations.  Further¬ 
more,  they  are  characterized  by  a  uniqueness  condition  on  corresponding  abstract  derivations. 

For  each  concrete  non-terminal,  Cycled  defines  the  unique  concrete  non-terminal  in  A base  which  is 


cycle  equivalent  to  it,  if  there  is  one. 
Define  Cycled  :  N  — ►  (Abase  U  {-!-})  by 


[  X  if  X  €  Abase 
A  if  3!A  £  Aba  sei  (A  =>  a 


VX  €  N,  Cycled(X)—  < 


P^A 


base 


foXCo  ZotiACiCo,  where 

base 


£c>j£l)CO)Cl  £  an£i  ^o(-A)  ^  MtelAClCo)) 


‘ cyclic 


_L  otherwise. 


Acyclic  is  the  set  of  concrete  non-terminals  which  are  cycle  equivalent  to  a  non-terminal  in  Abase- 
Note  that  it  is  not  possible  for  two  non-terminals  in  Abase  to  be  cycle  equivalent  to  each  other. 
Define  Acyclic  ^  X  by  Acyclic — Cycled  (Abase'j- 


Example: 


Cycled({stmt})  =  {stmt}. 

Cyc/ed({expr,  term,  factor,  primary})  =  {expr}. 

Cycled(  {assignment,  ifstmt,  elseclause,  sup-expr})  =  {±}. 

Acyclic  =  {stmt,  expr,  term,  factor,  primary}. 

Note  that  (expr  4  factor 4- factor),  while  (term  4  (factor+ factor))  The  derivations  differ  syntacti¬ 
cally  only  by  the  presence  of  the  parenthesis  in  the  latter,  which  results  from  the  cyclic  concrete 
derivation  corresponding  to  the  abstract  rule  4.  _ 


Two  concrete  non-terminals  X  and  A  can  be  chain  equivalent  to  each  other  in  two  different  ways. 
First  of  all,  when  A  is  the  only  concrete  phrasal  form  that  can  be  derived  from  X  without  rewriting 
a  non-terminal  in  Acydic  after  the  first  rewrite,  X  and  A  are  chain  equivalent.  Secondly,  for  a  given 
X,  if  no  non-terminal  satisfies  the  first  condition  and  A  is  the  only  non-terminal  which  can  derive 
a  phrasal  form  containing  X  without  rewriting  a  non-terminal  in  A  cyclic  after  the  first  rewrite  and 
the  only  phrasal  form  so  derived  is  X,  then  X  and  A  are  also  chain  equivalent. 

If  X  and  A  are  chain  equivalent,  every  terminal  string  derived  from  X  is  also  derived  from  A. 


A  is  the  set  of  concrete  non-terminals  each  of  which  is  cycle  equivalent  to  a  non-terminal  in  Abase 
but  is  not  chain  equivalent  to  any  non-terminal  in  ACydic  other  than  itself. 

Define  A  C  N  by 


A= 


A  £  Acyclic 


(3 ]p  £  P  such  that  lhs(p)  =  A)  and 
A  =>  7  4  B,  where  B  £  Acydic 

P-^A  , 

** cyclic 


and  Cycled(B)  =  Cycled(A) 


Abase  is  not  necessarily  a  subset  of  A.  In  fact,  5  itself  may  not  be  in  A. 
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Example: 

A  -  {stmt,  expr,  term,  primary}. 

The  concrete  non-terminal  factor  is  not  in  A  because  it  is  chain  equivalent  to  primary. 

To  illustrate  why  Aba,e  may  not  be  a  subset  of  A,  consider  the  following  modification  of  the  example: 
Suppose  that  A^se  was  defined  as  {stmt,  factor}  instead  of  {stmt,  expr}. 

Both  Acyclic  and  A  would  still  be  the  same.  (In  fact,  everything  that  has  been  or  will  be  defined 
would  still  be  the  same,  except  for  Abate ■)  However,  now  Abase  £  A. _ 

Chained  and  Achain  define  the  set  of  concrete  non-terminals  that  are  chain  equivalent  to  the  non¬ 
terminals  of  A. 

Define  Chained  :  N  — ►  (*4.U  {-L})  by 
VX  €  N, 

if  X  €  A  Chained(X)=X 

else  if  (3!a  6(SU^)*,I  4-  a)  and  a  =  A  Chained(X)=A 

else  if  (3L4  €  A,  A  =>  a  £X(  and  (,  C  €  (E  U  A)*)  and  f  =  C  =  e  Chained(X)=A 
otherwise  Chained^X)—. L. 

The  use  of  “else”  and  “3!”  in  this  definition,  rather  than  “or”  and  “3”,  ensures  that  a  concrete 
non-terminal  is  chain  equivalent  to  at  most  one  non-terminal  of  A. 

Define  Achain  C  N  by  Acham^  Chained  ( A) • 


Example: 

By  the  first  condition, 

Chained(  stmt)  =  stmt 
Chained(ex pr)  =  expr 
Chained  (term.)  =  term 
Chained^  primary)  =  primary. 

By  the  second  condition, 

Chamed(sup-expr)  =  expr 
CTiained(factor)  =  primary. 

By  the  third  condition, 

Chained^  assignment)  =  stmt 
Chained^  ifstmt)  =  stmt. 

By  the  final  condition, 

Cfiamed(elseclause)  =  J_. 

Achain  =  {stmt,  assignment,  ifstmt,  sup-expr,  expr,  term,  factor,  primary}. 
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Note  that  A  C  Acyclic  Q  Achain  Q  N,  since  Acyciic\A  contains  only  concrete  non- terminals  that  are 
chain  equivalent  to  non-terminals  of  A,  and  therefore  Acyci,AA  C  Achain- 


The  mapping  ip  extends  the  renaming 
just  defined. 

Define  ip  :  (S  U  Achain )  — *■  (S  U  N)  by 


mapping  ipo  using  the  concrete  non- terminal  equivalences 


VX  G  (S  U  Achain )? 


H  x)= 


MX) 

ipQ{Cycled(X )) 

ipo{  Cycled(  Chained(X))) 


if  X  6  (S  U  Abase ) 

if  X  G  (*dA-4ixjse) 

if  X  G  (*4.cha,'n''(./4.  U  Abase))- 


Note  that  ipAchain(t,  U  N)  =  ipo  o  Cycled  o  Chained. 

The  concrete  symbols  in  the  set  (S  U  Achain)  are  said  to  be  abstracted,  as  each  of  them  is  mapped 
onto  an  abstract  symbol.  An  abstracted  concrete  phrasal  form  is  one  that  contains  only  abstracted 
concrete  symbols. 


Example: 

Abase  =  {stmt,  expr). 

^(stmt)  =  stmt. 

V’(expr)  =  expr. 

=  {term,  primary}. 

ip(  term)  =  expr. 

^(primary)  =  expr. 

Achain^iAl)  Abase)  =  {assignment,  ifstmt,  sup-expr,  factor}. 

^(assignment)  =  stmt. 

V>(ifstmt)  =  stmt. 

V?(sup-expr)  =  expr. 

^(factor)  =  expr. 

N\ Achain  =  {elseclause},  so  ip  is  not  defined  on  the  concrete  non-terminal  elseclause. 

II  is  the  set  of  rightmost  concrete  derivations  that  rewrite  an  abstracted  non-terminal  as  an 
abstracted  phrasal  form  without  rewriting  any  other  abstracted  non-terminals  after  the  first. 

H-baae  is  II  without  certain  chain  derivations. 

Define 


1 

f 

{A  4-  A'  =5-  a'  a)  G  PP*A ,  where  A  G  A,  A'  G  Achain, 

*  €  PP'^ 

a’  G  (2u  Achain)*,  a  6  (2  U  A)*,ip{A')  =  ip(A),ip(a ')  =  ip(a),  1 
and  7r  is  rightmost 

Define  n=nbase  U  {it  G  PP*a  \A  ^  B  where  A,B  G  Achain,  rp(A)  =  ip(B ),  and  t  is  rightmost}. 

II  and  II base  contain  only  rightmost  derivations  so  that  no  two  derivations  in  the  same  set  perform 
the  same  rewrite  in  different  orders. 
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Ilba^e,  and  hence  II,  can  be  infinite  in  size.  However,  the  next  requirement  imposed  by  grammatical 
abstraction  will  ensure  that  the  derivations  of  Il^e  and  II  have  a  finite  number  of  distinct  Ihs  and 
rhs.  Further,  Il^e  and  II  can  be  infinite  in  size  only  if  Q  is  ambiguous. 


Example: 

TLbce  =  {3,4:5,4:6,8,9,10,11,13,14}. 

(The  derivation  consisting  of  rule  4  followed  by  rule  5  is  represented  by  “4:5”,  since  “45”  is  poten¬ 
tially  ambiguous.) 

Else-clause  is  the  only  concrete  non-terminal  that  is  not  abstracted,  so  only  rules  containing 
else-clause  in  their  rfis’s  result  in  II derivations  of  length  greater  than  one. 

n  =  nba<eu  (1,2,7, 12). _ _ 

II*  is  the  set  of  concrete  derivations  that  both  rewrite  and  yield  abstracted  concrete  phrasal  forms. 

Q  is  a  grammatical  expansion  of  Q  if  and  only  if  both  of  the  following  hold: 

For  every  (A  — *  6i)  G  P ,  there  is  a  unique  x  G  II  such  that  A  =5-  a,  = 

=  a. 

For  every  tt  G  II,  where  A  a,  either  i/)(A)  =  rp( a)  or  (ip(A)  — ►  i>(a))  G  P ■ 

Or,  less  precisely,  Q  is  a  grammatical  expansion  of  Q  when  for  each  basic  concrete  derivation  from 
abstracted  non-terminal  to  abstracted  phrasal  form  there  is  a  unique  abstract  rule,  and  vice  versa. 

I 

For  the  remainder  of  this  section,  let  Q  and  Q  be  context-free  grammars  such  that  Q  is  an  expansion 
of  g. 


2.2  Contract  and  Expand 

Any  concrete  derivation  that  satisfies  certain  constraints  can  be  converted  into  a  precisely  cor¬ 
responding  abstract  derivation.  Any  abstract  derivation  at  all  can  be  converted  into  an  almost 
precisely  corresponding  concrete  derivation.  This  concrete  derivation  satisfies  the  conversion  con¬ 
straints  on  concrete  derivations.  However,  the  concrete  derivation  may  contain  additional  rule 
sequences  which  correspond  to  cyclic  rules  not  present  in  the  abstract  derivation.  This  section 
shows  how  to  obtain  these  conversions. 


2.2.1  n  Subsets 


There  are  concrete  derivations  in  n  that  correspond  to  null  abstract  derivations  rather  than  to 
abstract  rules. 
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DEFINITION:  Trivial  Concrete  Derivations 


ntr,w  is  the  set  of  concrete  derivations  in  II  that  do  not  correspond  to  an  abstract  rule. 
Define  ntr;„,a/  C  II  by 

ntr,w={7r  6  n  I  A  4-  a,  where  A  G  Aham,  a  6  (S  U  A/iain)*,  and  rp(A)  =  4>(a)}. 

For  all  7T  G  II  triViai)  A  B,  where  A,  B  G  Afcain  and  ^(A)  =  V'(-S)- 
Note  that  while  II\II&aSe  C  II trivial,  the  reverse  need  not  be  true. 

I 

Example: 

ntr,w  =  {1)2,7,9,11,12}  g  {1,2,7,12}  =  II  MW _ 

There  are  concrete  derivations  in  II  corresponding  to  cyclic  abstract  rules. 

Definition:  Cyclic  Concrete  Derivations 

H  cyclic  is  the  set  of  concrete  derivations  that  correspond  to  abstract  rules  in  Pcyciic- 
Define  II cyciic={n  G  II  |  Contract^)  G  PCyclic}  Q  H. 

Note  that  H cyclic  ^  II t ritual  —  0* 

I 

Example: 

Ilcyc/ic  =  {1A}. 


2.2.2  Contraction 

A  concrete  derivation  can  be  transformed  into  a  corresponding  abstract  derivation. 
Definition:  Concrete  Derivation  Contraction 
Define  Contract :  II  — +  P  U  {0^|A  G  N}  by 

Vx  G  n,  where  A  =£  a,  Contract (tt)=  |  ^(a)) 

Extend  Contract  to  Contract  :  II*  — ♦  P*  by 

Contract((A  %  £B(  %  £/J0)  =  {^(A)  ContI^t(,ro)  ^BQ  ContT^t(^  ^0) 

and 

Contract^  a)— 0</>(. A) 

where  (A  =£  £B(),  (B  3-  f3)  G  II*. 
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Note  that  if  ir  G  II*  is  aconcrete  derivation  such  that  A  =5-  a,  where  A  G  Achain  aa(I  o  G  (S  U  AchainT , 

UA)  Cm‘ «<(<*). 

I 

Example: 


JT 

Contract (x) 

1 

0 _ . 

stmt 

2 

^stmt 

3 

1 

4:5 

2 

4:6 

3 

7 

0  — 
wexpr 

8 

5 

9 

0  — 
vexpr 

10 

6 

11 

0  — 
vexpr 

12 

®expr 

13 

7 

14 

4 

2.2.3  Expansion 

Any  two  concrete  non-terminals  in  A  that  map  onto  the  same  abstract  non-terminal  A  can  derive 
each  other  in  Q,  possibly  with  some  terminals  added,  using  only  trivial  and  cyclic  derivations. 
This  property  is  needed  when  transforming  an  abstract  derivation  into  a  corresponding  concrete 
derivation:  an  abstract  non-terminal  in  a  given  context  may  correspond  to  one  concrete  non¬ 
terminal  as  the  Ihs  of  a  concrete  sub-derivation  and  as  a  different  concrete  non-terminal  as  part  of 
the  rhs  of  a  concrete  sub-derivation.  Thus,  the  corresponding  concrete  derivation  must  include  a 
trivial/cyclic  sub-derivation  that  rewrites  the  one  concrete  non-terminal  as  the  other,  possibly  with 
some  terminals  added. 

Definition:  Concrete  Non-Terminal  Coercion 
Let  £,C  G  A  such  that  ip(B)  =  if>( C )  be  given. 

Then  there  is  at  least  one  concrete  derivation  ir  G  (nfr,„,0I  U  IIcyc/ic)*  such  that  B  =>  where 

£,£  G  S*  and  Contract^ )  G  P cyclic- 

Define  Coerce(B ,  C)=tc,  where  7T  is  arbitrarily  chosen  from  among  the  shortest  such  derivations. 

I 
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Example: 


Note  that  even  though  ^(sup-expr)  =  ^>(expr),  sup-expr  A ,  so  Coerce(sup-expr,  expr)  is  unde- 


An  abstract  derivation  can  be  transformed  into  a  corresponding  concrete  derivation.  The  resulting 
concrete  derivation  may  have  more  cyclic  derivations  than  the  abstract  derivation  had  cyclic  rules, 
but  each  rule  in  the  abstract  derivation  will  correspond  to  a  non-trivial  and  non-cyclic  sub-derivation 
of  the  concrete  derivation,  and  vice  versa. 


An  abstract  rule  can  be  transformed  into  a  corresponding  concrete  derivation. 


DEFINITION:  Abstract  Rule  Expansion 


Define  Rule-Expand  :  P  — ►  II*  as  follows: 

Let  p  €  P  be  an  abstract  rule  such  that  A  =>  d,  where  A  G  N  and  a  G  (E  U  N)*. 

By  the  definition  of  grammatical  expansion,  there  is  a  unique  concrete  derivation  7Ti  G  II  such  that 
A!  Q  a',  where  A'  G  Achdn,  «'  G  (£  U  -A,/,,,,,,)*,  ip(A')  =  A,  and  ip(a')  =  d. 

In  fact,  X\  G  Ilbase  Q  II. 

By  the  definitions  of  Ubaae  and  ntri(Jia/,  there  are  rightmost  derivations  x0,x2  G  IItr, •„,<,/*  such  that 
A  A!  a'  a,  where  A  €  A,  a  G  (E  U  A)',  ip(A)  =  ip(A'),  and  ^(°0  =  VKa')- 
By  the  definition  of  Chained  and  the  unambiguity  of  Q,  7To  and  x\  are  unique. 

Set  Rule-Expand(p)=x={A  ^  A'  =£•  a'  =£  a). 

Contraction)  =  Contract((A  =£  A'  =£■  a'  =£  a))  = 


W.A) 


Contractu r0)  . ,  Contract^)  .  ..  Contract (*-2)  .  . . 

=>  V>(A')  =>•  V’(a')  =>  V(a))  = 


Contract{x\ )  =  p,  since  7To,  7r2  G  IItri„;aj*. 

Note  that  //is(tt)  G  A ,  r/is(7r)  G  (S  U  A)*,  ^(lhs(x))  =  A,  and  Tp(rhs(x))  = 

I 


d. 


Any  abstract  derivation  can  be  transformed  into  a  corresponding  concrete  derivation. 

Definition:  Abstract  Derivation  Expansion 

Let  x  G  P*  be  an  abstract  derivation  such  that  ila,  where  A  G  N  and  d  G  (E  U  N)*. 

Then  there  exist  one  or  more  derivations  x  G  II*  in  Q  and  x'  —  Contract(x)  in  Q  such  that 
A^aandi^  d',  where  A  G  A,  a  G  (S  U  A)*,  a'  G  (E  U  N)*,  rj>(A)  =  A,  d'  =  4>(a),  and 

X  ^Pcyclic  —  x\PCy c/ic* 
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Expand (x)  will  be  defined  as  the  derivation  it  such  that  tt  is  any  of  the  shortest  derivations  that 
satisfies  the  given  conditions. 

Expand  is  constructed  inductively  on  the  length  n  of  tt. 

Base  case  (n  =  0): 

Define  Expand^ ^)=0 a,  where  A  G  ((0n  (A  *  IV))  (A)). 

If  there  is  more  than  one  such  A,  Expand(<D^)  is  non-deterministically  defined  as  each  of  them. 
The  claim  is  trivially  true. 

Secondary  base  case  (n  =  1): 

Let  p  €  P  be  an  abstract  derivation  of  length  1  such  that  A  =>•  a,  where  A  G  N  and  a  G  (S  U  N). 

Set  Expand(p)—Rule-Expand(p). 

The  claim  follows  from  the  definition  of  Rule-Expand. 

Inductive  case  ( n  >  1): 

Let  x  G  P*  be  an  abstract  derivation  of  length  n  such  that  A  =5-  a,  where  A  G  IV  and  a  G  (S  U  N)*. 
Let  (A  =$•  a)  =  {A  =£  7 B6  4-  7 /M),  where  xq  G  P*,  P  £  P,  B  G  JV,  and  7, 6, 0  G  (S  U  N)*. 


By  the  inductive  hypothesis,  there  are  derivations  xo  G  II*  in  Q  and  Xq  —  Confracf(xo )  in  Q  such 
that  A  ^  7 B6  and  A  4  7'IW',  where  A,B  G  A  7,*  6  (S  U  A)*,  7',$'  €  (S  U  IV)*,  0(A)  =  A, 

7 'BS1  —  and  TT  (pPcycUc  =  ^(0  Pcyclic- 

Define  7r2  G  II*  by  iT2=Rule-Expand(p). 

By  the  definition  of  Rule-Expand ,  C  =£  /?,  where  C  G  A,  f3  G  (S  U  A)*,  0(C)  =  P,  and  0(/3)  —  /3. 

Define  the  concrete  derivation  ir\  =  Coerce(B,C)  G  (ntru,,a/  U  11^, c)*. 

By  the  definition  of  Coerce ,  B  =$•  £C£,  where  (,(  G  S*  and  xi=:C0ntract(xi)  ^  cyclic  ’ 


Define  x=<A  $  7P^  ^  7?CC<5  5  7W>  and  «=7^C<5  €  (S  U  A)*. 
Define  Tt'=Contract(Tr)  =  Contract((A  =£  7P<5  =£■  7fC(<5  =£  7 = 


^  Contomi(ir0)  Contract^)  Cont™t(*2)  = 


(A  3-  75^  ^  7£PC*  =4 

Define  a'=,ip(a)  =  0(7 —  7£PC<^  €  (S  U  IV)*. 
Clearly  A  a  and  A  =>  d'. 

x'^Pcyclic  =  (A  ^  7P<f>  =4  7?-BC^  =4  7 £/3&)\PCyclic 
Define  Expand(fr)=TT . 


(A  4  7P^  4-  7  fil^P cyclic  =  TT^P cyclic  • 


This  completes  the  induction,  and  thus  the  construction. 


I 
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2.2.4  Inversion 


Contract  and  Expand  are  very  nearly  inverses:  Contract  is  the  right  inverse  of  Expand,  and  is 
nearly  the  left  inverse  as  well.  The  mapping  ( Contract  o  Expand )  is  virtually  an  identity  mapping; 
it  may  add  a  few  cyclic  rules  but  does  not  otherwise  modify  derivations. 

Theorem:  Invertibility  of  Contract  and  Expand 

Let  x  £  P*  be  an  abstract  derivation  such  that  A  =5-  a,  where  A  £  N  and  a  £  (E  U  N)*. 

Define  the  abstract  derivation  x'  £  JP*  by  x'=  Contract(Expand(ir)) . 

Then  A  =>  a',  where  a'  £  (S  U  N)m  and  x  '\PCyclic  =  Pcyclic ■ 

Let  x  €  II*  be  a  concrete  derivation  such  that  A  =5-  a,  where  A  £  Achain  and  a  6  (S  U  Achain)*- 
Define  the  concrete  derivation  x'  €  II*  by  ir,=Expand(Contract('ir)). 

Then  A!  a',  where  A'  €  A,  a'  6  (S  U  A)*,  A ')  =  ip(A),  and  ip(a')  = 

If  A  £  A  then  A'  =  A,  if  a  £  (S  U  *4.)*  then  a'  =  a,  and  if  A  £  A  and  a  £  (£  U  A)*  then  x'  =  x. 
If  x'  =  0A',  then  x7  =  Expand(Q^A))  is  non-deterministically  defined,  and  x'=0 Cycled(Chained(A)) 
is  the  only  choice  for  which  the  preceding  claim  holds  true. 

I 


2.3  Simplifications 

The  definition  of  grammatical  abstraction  permits  a  set  of  concrete  non  terminals  to  be  cycle 
equivalent  in  very  complicated  ways.  The  members  of  a  concrete  non-terminal  cycle  equivalence 
class  can  relate  to  each  other  in  arbitrarily  complex  cycles  involving  any  number  of  cyclic  concrete 
derivations. 


Example: 

To  illustrate  the  sort  of  strange  cycles  that  may  exist,  consider  adding  both  the  concrete  rule 
(primary  —*  [expr])  and  the  corresponding  abstract  rule  (expr  — ►  [expr])  to  the  example  grammars. 
Further  add  the  rule  (expr  ->  "[expr])  to  Pcydic-  Although  Q  is  still  a  grammatical  expansion  of  Q, 
these  rules  adds  an  extra  link  in  the  expr  cyclic  equivalence  set.  One  consequence  of  this  would  be 
that  Coerce( primary,  expr)  could  be  either  primary  =>  [expr]  or  primary  =>  (expr). 


However,  the  concept  of  grammatical  expansion  can  be  applied  more  easily  if  there  is  at  most  one 
cyclic  concrete  derivation  for  each  concrete  non-terminal  cycle  equivalence  class.  This  section  adds 
this  requirement  and  explores  the  consequences. 

Restriction:  At  Most  One  Cyclic  Rule  For  Each  Abstract  Non-Terminal 

With  only  a  single  cyclic  ride  for  an  abstract  non-terminal,  there  can  be  only  a  single  cyclic  con- 
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crete  derivation  for  the  corresponding  concrete  non-terminal  cycle  equivalence  class.  Further,  if 
this  derivation  is  divided  into  sub-derivations  in  II,  all  but  one  of  the  sub-derivations  will  be  in 
II trivial-  This  has  implications  for  the  concrete  representation  of  abstract  non-terminals  and  for  the 
determinism  of  Coerce. 

For  all  p,q  6  Pcyclic  if  Mp)  =  tliei1  P  =  9- 

I 

Example: 

The  restriction  is  true  of  the  given  P cyclic- 

Again  consider  adding  both  the  concrete  rule  (primary  — >  [expr])  and  the  corresponding  abstract 
rule  (expr  — *•  [expr])  to  the  example  grammars. 

Whether  or  not  this  new  abstract  rule  is  added  to  Pcyclic »  0  is  still  a  grammatical  expansion  of  Q. 
Note  that  P cyclic  must  contain  at  least  one  of  the  abstract  rules  (primary  -*■  (expr))  and  (expr  ->  [expr]) 
for  Q  to  be  a  grammatical  expansion  of  Q  relative  to  P cyclic- 

However,  because  of  the  restriction,  Pcyciic  may  contain  either  this  new  abstract  rule  or  the  original 
abstract  rule  (expr  — ►  (expr)),  but  not  both.  _ 

The  result  of  applying  Expand  to  a  zero-length  abstract  derivation  may  be  non-deterministic. 
Further,  the  concrete  context  in  which  the  expansion  occurs  may  require  some  property  of  the 
result  which  is  true  of  some  but  not  all  of  the  possible  results. 

For  example,  non-deterministically  define  the  concrete  non-terminal  A  =  Ihs  ( Expand  ( 0  e^r ) ) . 
Sometimes  it  is  necessary  that  A  term  *  factor,  which  is  true  when  A  G  {expr,  term}. 

Other  times  it  is  necessary  that  expr  =>  term  *  A,  which  is  true  when  A  G  {factor,  primary}. 

As  illustrated  by  this  example,  it  is  not  possible  to  produce  a  single  result  which  will  satisfy  all 
possible  contexts.  However,  it  is  possible  to  produce  one  result  that  will  satisfy  all  possible  contexts 
where  the  non-terminal  appears  in  the  Ihs  of  a  derivation,  as  in  the  first  case,  and  another  result 
that  will  satisfy  ail  possible  contexts  where  the  non-terminal  appears  in  the  rhs  of  a  derivation,  as 
in  the  second  case. 

For  each  abstract  non-terminal  A  there  is  a  unique  concrete  non- terminal  Top(A)  which  derives  any 
concrete  non-terminal  in  A  that  maps  onto  A.  This  top  non-terminal  may  always  be  used  as  the 
concrete  non- terminal  corresponding  to  A  when  A  is  the  Ihs  of  an  abstract  derivation.  The  name 
Top  is  used  because  the  Ihs  of  an  abstract  derivation  is  represented  by  the  top  of  the  corresponding 
syntax  tree. 

Definition:  Abstract  Non-Terminal  Top  Concrete  Representation 
Define  Atop  C  A  by 

Atop={A  G  A\->3ir  G  ntr,w*,  B  £  A  where  B  G  A  and  B  £  .4}. 
xp  fl  ( Atop  x  N)  is  invertible. 

Define  Top  :  (2  U  N)  -  (S  U  Atop )  by  Top={rp  n  ((S  U  Atop)  x  (S  U  iV)))_1. 
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For  all  A  6  N,  Top(A)  lhs(Expand($  ^)). 

n trivial 

I 


Example: 

■Atop  =  {stmt,  expr}. 

Top(stmt)  =  stmt  and  Top(expr)  =  expr. 

For  each  abstract  non-terminal  A  there  is  a  unique  concrete  non-terminal  Bottom(A)  which  can  be 
derived  from  any  concrete  non-terminal  in  A  that  maps  onto  A.  This  bottom  non-terminal  may 
always  be  used  as  the  concrete  non-terminal  corresponding  to  A  when  A  is  in  the  rhs  of  an  abstract 
derivation.  The  name  Bottom  is  used  because  the  rhs  of  an  abstract  derivation  is  represented  by 
the  bottom  or  frontier  of  the  corresponding  syntax  tree. 

Definition:  Abstract  Non- Terminal  Bottom  Concrete  Representation 

Define  Abott  om  A  by 

Abottom  =  {A  €  A\-i3x  £  TLtrivial*,  A  B  where  B  6  A  and  B  #  A). 


n  ( Abottom  X  N)  is  invertible. 

Define  Bottom  :  (E  U  N)  ->  (E  U  Abottom)  by  Bottom=(i>  n  ((E  U  Abottom )  x  (E  U  N ))) 
For  all  A  €  N,  lhs(Expand(® ^))  =>  Bottom(A). 

n trivial 

I 


Example: 

Abottom  =Jstmt,  primary). 

Bottom(stmt )  =  stmt  and  Bottom(ex pr)  =  primary. 

For  any  abstract  non-terminal  A  £  N,  the  concrete  non-terminal  Top(A)  (or  Bottom(A))  may 
always  be  used  for  lhs{ Expand^ ^))  when  it  is  known  that  the  concrete  non-terminal  will  be  used 
in  the  Ihs  (or  rhs)  of  a  concrete  derivation,  as  is  the  case  for  top-down  syntactic  elaboration  (or 
bottom-up  incremental  parsing). 

The  cyclic  rules  specified  by  the  restricted  Pcydic determine  a  unique  Coerce  mapping. 

Definition:  Concrete  Non-Terminal  Coercion  (Unique) 

With  the  restriction  placed  on  Pcyc/,c,  there  is  only  one  Coercemapping. 

There  is  a  unique  minimal  partial  order  -<  on  A  such  that 

for  all  A  £  N,  for  all  B.C  £  (^n  (A  x  iV))-1(A),  ( Coerce(B,C )  £  II(r,ma/*  &  C  A  B). 


17 


There  is  an  efficient  algorithm  to  compute  Contract  o  Coerce: 

VB,C  £  A,  where  rp(B)  =  ^(C), 

Contr^CMC))  -  {  %££ 

I 

Example: 

expr  -<  term  -<  factor  -<  primary.  _ 


2.4  Contract  and  Expand  Algorithms 

The  Contract  and  Expand  mappings  convert  between  abstract  and  concrete  derivations.  An  ab¬ 
stract  derivation  is  represented  by  an  abstract  syntax  tree.  A  concrete  derivation  can  be  similarly 
represented  by  a  concrete  syntax  tree.  Thus  algorithms  for  the  Contract  and  Expand  mappings 
apply  to  abstract  and  concrete  syntax  trees.  A  Contract  algorithm  takes  a  concrete  syntax  tree 
and  produces  a  corresponding  abstract  syntax  tree,  and  an  Expand  algorithm  does  the  reverse. 

The  basic  algorithms  for  Contract  and  Expand  are  identical.  The  abstract  [concrete]  syntax  tree 
is  partitioned  into  a  set  of  sub-trees  that  represent  abstract  [concrete]  derivations  in  P  [II].  By 
the  definition  of  grammatical  expansion,  each  such  abstract  [concrete]  derivation  corresponds  to  a 
unique  concrete  [abstract]  derivation  in  II  [P];  a  concrete  [abstract]  sub-tree  is  constructed  to  rep¬ 
resent  this  corresponding  derivation.  The  resulting  concrete  [abstract]  sub-trees  are  then  combined 
into  a  single  concrete  [abstract]  tree  in  the  same  way  that  the  set  of  abstract  [concrete]  sub-trees 
formed  the  original  abstract  [concrete]  tree. 

The  algorithm  just  described  has  five  steps: 

1.  Partition  the  syntax  tree. 

2.  Determine  the  derivation  represented  by  each  sub-tree. 

3.  Determine  the  other  grammar  derivation  corresponding  to  each  represented  derivation. 

4.  Construct  a  sub-tree  for  each  such  corresponding  derivation. 

5.  Combine  these  sub-trees  into  a  single  tree. 

The  first  step  is  easily  done  for  Expand  by  dividing  the  tree  into  unit  sub-trees,  and  for  Contract  by 
dividing  the  tree  at  nodes  that  represent  symbols  in  Achain •  For  Expand ,  the  second  step  is  trivial, 
since  each  unit  sub-tree  designates  the  rule  (or  derivation)  that  the  sub-tree  represents.  The  third 
step  can  be  handled  by  a  simple  table  lookup.  The  fourth  step  is  a  straightforward  tree  operation. 
For  Contract,  the  fifth  step  is  also  a  straightforward  tree  operation.  However,  the  fifth  step  for 
Expand  and  the  second  step  for  Contract  are  not  so  simple. 

Syntax  trees  are  combined  by  appending  one  tree  to  another  at  a  leaf  of  the  latter  tree,  where  the 
non-terminal  R  represented  by  the  former  tree’s  root  is  the  same  as  L ,  the  non-terminal  represented 
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by  the  latter  tree’s  leaf.  However,  the  Expand  algorithm  needs  to  combine  trees  for  which  R  and  L 
are  not  necessarily  the  same,  although  they  are  guaranteed  to  be  cycle  equivalent.  This  problem  is 
handled  by  constructing  a  third  syntax  tree  whose  root  represents  L  and  whose  unique  non-terminal 
leaf  represents  R.  Such  a  tree  can  be  constructed  by  applying  steps  three  and  four  to  the  derivation 
Contract(Coerce(L,  R)),  which  is  efficiently  computable  using  the  results  of  the  previous  section. 
With  this  third  syntax  tree  interposed  between  the  other  two,  it  is  possible  to  construct  a  single 
syntax  tree  that  combines  the  two  trees. 


Determining  the  n  derivation  represented  by  a  concrete  syntax  tree  can  be  done  in  a  bottom-up 
manner  as  follows:  For  each  sub-tree  whose  leaves  are  leaves  of  the  whole  tree,  generate  a  state 
representing  the  sub-derivations  it  matches,  called  the  Contract  state.  The  set  of  Contract  states 
is  nau6,  the  (rightmost)  sub-derivations  of  n,  defined  by 


n»uh=  <  7T  6  P* 


(A  =5-  £(3Q  G  n,  where  A,B  G  Achain,  1 

f,  C,  /3  e  (£  U  Achain)* i  and  x  is  rightmost  j 


Note  that  II  C  niu6,  so  II, ub  contains  a  unique  state  for  each  derivation  in  II,  as  well  as  for  each 
sub-derivation.  Like  II,  can  be  infinite  in  size,  but  its  derivations  have  a  finite  number  of 
distinct  Ihs  and  rhs. 


Example: 

nsufc  =  {5,6,1,2,7,12,3,4:5,4:6,8,9,10,11,13,14}. 

Note  that  njuiMI  =  {5,6},  since  there  are  only  a  couple  of  derivations  in  II  with  length  greater 
than  one.  _ _ _ _____ _ 

The  Contract  state  of  a  tree  node  representing  a  rule  p  does  not  necessarily  depend  on  the  state  of 
each  of  the  node’s  children.  For  example,  the  state  of  a  child  node  representing  a  terminal  a  G  E  or 
a  non-terminal  A  G  Achain  is  irrelevant,  since  such  a  node  will  always  have  exactly  the  same  state 
(0a  or  0^).  Similarly,  the  state  of  a  child  node  representing  a  non-terminal  from  which  only  one 
derivation  in  nju<,  begins  is  also  always  the  same,  by  definition.  Define  Namb,g  C  N  by 

Nambig={A  G  N\ Achain  |  t,jt'  G  nju6,  A  =$•  a,  A  £  a',  where  a,  a'  G  (S  U  Achain)*,  and  a  #  a’}. 

Nambig  is  the  set  of  non-abstracted  non-terminals  such  that  any  node  representing  a  member  of  this 
set  has  more  than  one  possible  state,  depending  on  the  states  of  its  child  nodes. 


Example: 


Nambig  =  {else-clause} ,  since  else-clause  is  the  only  concrete  non-terminal  that  is  not  abstracted, 
and  it  is  the  Ihs  of  more  than  one  rule. _ _ _ 

The  Contract  state  of  a  tree  node  representing  rule  p  depends  only  on  the  state  of  child  nodes  that 
correspond  to  symbols  of  Namb,g  in  the  rhs(p).  If  there  are  no  such  symbols  in  rhs(p),  the  node’s 
state  does  not  depend  on  the  state  of  its  children  at  all.  Conversely,  when  the  rhs(p)  does  contain 
symbols  in  Nambig,  the  node’s  state  depends  on  the  state  of  the  corresponding  child  nodes.  Define 

Rambig  C  P  by 

Pambig  =  {p  €  P  I  rhs(p)  =  A  G  Nambig,  and  £,(  G  (£  u  Achain)*}- 
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Pambig  is  the  set  of  rales  such  that  the  state  of  any  node  representing  a  member  of  this  set  depends 
on  the  state  of  some  of  that  node’s  children. 


Example: 

Pambig  =  {4},  since  only  concrete  rule  4  contains  the  concrete  non-terminal  else-clause,  which  is 
the  only  element  of  Nambig. 

The  Contract  state  for  any  tree  node  representing  a  rule  not  in  Pambig  can  be  computed  from 
the  rule  itself.  However,  even  this  computation  is  unnecessary  if  the  node’s  state  will  never  be 
examined.  A  node’s  state  is  examined  only  when  the  node  represents  a  non-terminal  in  N ambig  or 
the  node  is  the  root  of  a  sub-tree  representing  a  derivation  in  II.  In  the  latter  case,  the  node  also 
represents  a  non-terminal  in  A chain-  Thus,  Achain  U  Nambig  is  the  set  of  non- terminals  such  that  a 
state  must  be  computed  for  any  node  representing  a  member  of  this  set. 


Each  concrete  rule  p  £  P  is  assigned  an  abstraction  action  action(p)  describing  the  computation 
that  must  be  performed  to  determine  the  Contract  state  for  any  node  representing  p. 

Define  action  :  P  — *■  {none,  mark,  disambiguate}  by 


Vp  e  P, 


action(p)=  ^ 


mark 

disambiguate 

none 


if  Pi  Pambig  and  lhs(  p)  £  ( Achain  U  Nambig) 

if  P  £  P ambig 

otherwise. 


Mark  computes  a  node’s  state  without  examining  its  children.  Disambiguate  computes  a  node’s 
state  by  examining  the  states  of  at  least  some  of  a  node’s  children.  None  naturally  does  nothing. 


Example: 


p 

action(p) 

1 

mark  (none) 

2 

mark  (none) 

3 

mark 

4 

disambiguate 

5 

mark 

6 

mark 

7 

mark  (none) 

8 

mark 

9 

mark  (none) 

10 

mark 

11 

mark  (none) 

12 

mark  (none) 

13 

mark 

14 

mark 

Note  that  some  of  the  mark  entries  have  none  in  parentheses.  Such  rules  mark  derivations  in 
ntrtW,  which  are  typically  irrelevant  to  the  construction  of  an  abstract  syntax  tree. _ 
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Each  concrete  rule  p  also  has  some  data  data(p)  used  in  the  state  computation. 

For  action(p)  =  none,  data{p)=L. 

When  action(p)  =  mark,  data(p)=n  G  II where  7 r  is  the  unique  concrete  derivation  in  IIaufc  that 
begins  with  p. 

If  action(p)  =  disambiguate,  data(p)  is  aset  of  pairs  {{{tti,  ...,7r„),7r)}.  The  first  element  of  each 
pair,  {t?\ ,  •  •  • ,  7Tn),  is  a  tuple  of  possible  Contract  states  for  tree  nodes  representing  the  Nambtg  non¬ 
terminals  in  p's  rhs.  The  second  element  of  each  pair,  n  G  naui>,  is  the  state  of  a  node  representing 
p  whose  Nambig  children  have  those  states  tti  . . .  7rn.  data(p)  contains  all  such  possible  pairs. 


Example: 


p 

data(p) 

1 

1 

2 

2 

3 

3 

4 

{(5,4:  5),  (6, 4:6)} 

5 

5 

6 

6 

7 

7 

8 

8 

9 

9 

10 

10 

11 

11 

12 

12 

13 

13 

14 

14 

Note  that  when  IIau6  is  infinite  in  size,  the  data  mapping  as  just  described  maps  some  rules  onto 
infinite  sets.  This  problem  can  be  eliminated  by  arbitrarily  choosing  II ,ub'  Q  IIgu&  such  that  for  each 
derivation  7 r  G  IIa„6  there  is  a  unique  representative  derivation  ir'  G  II such  that  lhs(ir)  =  Ihs(ir') 
and  rhs(n)  =  rhs(ir').  Since  the  derivations  of  IIau6  have  a  finite  number  of  distinct  Ihs  and 
rhs ,  there  is  at  least  one  such  finite  IIauj/.  The  definition  of  the  data  mapping  is  then  modified 
by  substituting  each  Contract  state  n1  G  IIgu{/  for  all  of  the  Contract  states  7r  G  IIauj/  that  it 
represents.  This  modification  yields  a  data  mapping  that  does  not  map  any  rule  onto  an  infinite 
set. 

Given  a  concrete  syntax  tree  repesenting  a  derivation  in  II,  the  action  and  data  mappings  can  be 
used  to  determine  the  abstract  rule  or  null  derivation  to  which  the  concrete  derivation  corresponds. 
First  a  Contract  state  is  computed  for  the  root  of  the  tree,  bottom-up.  The  operation  at  a  tree 
node  representing  the  rule  p  is  simple: 

If  action(p)  is  none,  no  Contract  state  need  be  computed. 

If  action(p)  is  mark,  the  Contract  state  for  the  rule  is  data(p). 

If  action(p)  is  disambiguate,  the  Contract  state  is  determined  by  matching  the  Contract 
states  of  the  node’s  children  to  those  in  the  state  tuples  in  the  pairs  of  data(p). 

Once  the  Contract  state  for  the  tree’s  root  has  been  computed,  the  corresponding  abstract  rule  or 
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null  derivation  can  be  looked  up  in  a  table.  In  no  case  is  any  tree  node  examined  more  than  once, 
and  often  whole  portions  of  the  tree  are  not  examined  at  all. 


3  Input 


A  Ladle  input  specification  provides  the  following  four  things  for  a  language: 

a  set  of  textual  expressions  for  the  lexemes 

an  abstract  extended  context-free  grammar  for  the  language 

an  internal  representation  for  syntax  trees  of  the  language 

an  LALR(l)  concrete  extended  context-free  grammar  for  the  language 

Every  Ladle  description  has  the  form: 

“LANGUAGE”  (identifier) 

“LEXICAL” 

( lexical-  definition)  * 

“ABSTRACT” 

{ abstract-de finition)  * 

“CONCRETE” 

(concrete- definition)* 

The  identifier  names  the  language.  Each  set  of  definitions  describes  the  appropriate  feature  of 
the  language.  The  abstract  section  defines  both  the  abstract  syntax  and  the  syntax  tree  IR.  The 
concrete  section  may  be  omitted.  Appendix  B  gives  both  an  example  and  a  description  of  a  Ladle 
description. 

A  Ladle  identifier  must  begin  with  an  alphabetic  character,  and  may  contain  any  number  of 
alphabetic,  numeric,  or  underscore  (“_”)  characters.  Integers  are  decimal  and  unsigned.  The 
keywords  of  Ladle  are  case  insensitive.  Newlines,  tabs,  and  formfeeds  bewteen  Ladle  tokens  are 
whitespace. 

A  Ladle  description  may  contain  comments  anywhere  that  whitespace  is  legal.  A  Ladle  comment 
consists  of  any  amount  of  text  between  “/*”  and  Comments  may  be  nested. 

The  Ladle  processor  requires  that  the  concrete  grammar  be  a  grammatical  expansion  of  the  abstract 
grammar.  In  addition,  it  imposes  the  following  requirement  on  a  language  specification: 

For  all  7r ,  x'  G  11^/^  such  that  and  B  ft, 

where  .4,  B  G  Achain  and  a,  (3  G  (£  U  Achain)-, 
if  ip(A)  =  4>(B)  then 

A  =  B  and  there  is  a  C  G  Achain  such  that  a,  ft  G  E*{C}E“. 

It  is  an  error  for  Ladle  input  not  to  satisfy  this  condition.  Typical  Ladle  descriptions  satisfy 
the  requirement  anyway,  so  it  is  not  an  issue  in  practice.  The  requirement  simplifies  the  theory 
somewhat,  as  discussed  in  Section  2.3. 


22 


3.1  The  Lexical  Section 


The  lexical  section  of  the  language  specification  defines  the  lexemes  for  the  language.  The  lexemes 
are  the  terminals  for  both  the  abstract  and  concrete  grammars,  plus  lexical  constructs  such  as 
whitespace  and  comments  that  are  not  grammar  terminals.  A  single  set  of  terminals  is  used  for 
both  the  abstract  and  concrete  grammars,  so  S  =  S  and  t/>o  n  (S  x  S)  =  If; •  A  lexical-definition 
has  the  form: 

( identifier )  “=”  (lex-expression)  “=>”  { lex-status ) 

The  identifier  names  the  lexeme.  Each  lexeme  must  have  a  unique  name. 


The  lex-expression  is  an  extended  regular  expression  describing  the  set  of  strings  that  are  instances 
of  the  lexeme.  The  Ladle  processor  combines  all  of  these  expressions  into  an  automaton  that  scans 
text  a  character  at  a  time  until  the  longest  possible  lexeme  has  been  recognized.  This  automaton 
looks  only  one  character  ahead,  and  never  backs  up.  If  one  lexeme  is  a  prefix  of  another,  the 
automaton  will  optimistically  expect  the  longer  lexeme  if  the  lookahead  character  is  appropriate. 
When  the  portion  of  text  that  is  currently  being  scanned  doesn’t  match  any  lexeme,  a  special  error 
lexeme  is  recognized. 


The  basic  elements  of  a  lexical  expression  are  strings,  case  insensitive  strings,  and  character  sets. 
A  string  is  one  or  more  characters  delimited  by  “H”s.  A  case  insensitive  string  is  the  same,  but 
delimited  by  an  alphabetic  character  in  the  string  matches  either  case  of  the  character.  The 
word  string  is  normally  used  to  refer  to  both  regular  and  case  insensitive  strings.  A  character  set 
is  one  or  more  characters  delimited  by  “{”  and  u}”,  and  represents  the  set  of  characters  specified. 
The  character  in  a  character  set  indicates  a  range  of  characters;  e.g.,  “{a-zA-Z}”  is  the  set 
of  alphabetic  characters.  The  “\”  character  is  a  quoting  character  in  strings  and  character  sets, 
interpreted  as  follows: 


“W” 

“\n” 

“\t” 

“\d” 

“\b” 

“\e” 

“VJT 

“\n” 


backslash 

hyphen 

newline 

tab 

delete 

backspace 

escape 

control-A 

ascii  n,  octal 


No  other  quoting  convention  is  supported;  in  particular  “\A”  is  illegal  in  the  general  case.  It  is 
also  illegal  to  use  “\”  anywhere  else,  for  example,  in  an  identifier. 


The  basic  lexical  operators,  in  order  of  precedence  from  highest  to  lowest,  are: 


(expr)  * 

(expr)  + 
(expr)  (expr) 
(expr)  |  (expr) 


at  least  0  repetitions  of  expr 
at  least  1  repetition  of  expr 
concatenation 
alternation 
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[{expr)\  optional 

((expr))  grouping 

There  are  also  some  extended  lexical  operators  whose  use  is  more  restricted.  They  are: 

(expr)  -  ( string )  string  match 

(string)  ~  (string)  balanced  string  match 

(string)  IN  (identifier)  keyword 

The  string  operands  of  the  match  operators  and  must  not  be  case  insensitive  strings.  Both 
of  these  operators  match  strings  that  begin  with  the  left  operand  and  end  with  the  right  operand. 
The  balanced  version  only  matches  strings  with  balanced  occurrences  of  the  left  and  right  operands. 
The  match  operators  must  not  be  nested  within  either  of  the  repetition  operators.  Also,  if  a  match 
operator  appears  in  an  operand  of  the  concatenation  operator,  it  must  be  in  the  final  operand. 
The  keyword  operator  “IN”  is  useful  for  defining  special  instances  of  other  lexical  patterns  that 
are  to  be  treated  as  distinct  lexemes.  The  right  operand  must  be  the  name  of  a  previously  defined 
lexeme.  The  left  operand  may  be  any  string  that  matches  the  expression  for  this  previously  defined 
lexeme.  The  keyword  operator  may  only  be  nested  within  the  alternation  or  grouping  operators. 
Appendix  B  includes  examples  of  each  of  these  specialized  operators. 

Any  lexeme  whose  expression  operators  are  all  keyword,  concatenation,  or  grouping  operators  is  a 
constant  lexeme.  Such  a  lexeme  matches  only  a  single  string,  although  parts  of  this  string  may  be 
case  insensitive. 

Not  all  lexemes  are  terminal  symbols  for  the  language’s  grammars,  nor  are  all  lexemes  included  in 
the  IR  tree.  The  status  of  a  lexeme  is  described  by  its  lex-status ,  which  must  be  one  of 

IGNORE 

SCREEN 

OMIT 

PRESERVE 

or  may  be  left  out,  along  with  the  “=>”  preceding  it.  An  ignored  lexeme  is  not  a  grammar  terminal, 
nor  is  it  ever  in  an  IR  tree.  A  screened  lexeme  is  also  not  a  grammar  terminal,  but  it  should  be 
included  in  IR  trees  as  an  annotation  of  some  sort.  Omitted  and  preserved  lexemes  are  grammar 
terminals,  and  may  or  may  not  be  found  in  ER  trees.  A  preserved  lexeme  is  by  default  represented 
explicitly,  while  an  omitted  one  is  by  default  represented  implicitly.  In  either  case,  the  default  can 
be  overridden  for  a  particular  instance  of  the  lexeme  in  the  abstract  grammar.  A  warning  is  issued 
when  a  non-constant  lexeme  is  omitted.  If  no  lexeme  status  is  given,  the  default  status  assumed 
is  OMIT  for  constant  lexemes,  PRESERVE  for  all  others.  Appendix  B  includes  examples  of  lexeme 
status.  Typically  status  ignore  is  used  for  whitespace,  screen  for  comment  lexemes,  omit  for 
constant  lexemes  such  as  keywords,  and  preserve  for  non-constant  lexemes,  such  as  identifiers. 

Outside  of  the  lexical  section,  a  lexeme  may  be  referenced  by  its  name,  or  by  a  string.  In  the  latter 
case,  if  there  is  not  already  a  lexeme  defined  as  that  string,  a  new  lexeme  is  so  defined  with  lexical 
status  omit.  The  abstract  and  concrete  sections  must  refer  only  to  the  terminal  lexemes,  which 
are  the  lexemes  that  are  neither  ignored  nor  screened. 
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3.2  Extended  Grammar  Notation 


Each  of  the  abstract  and  concrete  sections  of  the  language  description  contains  an  extended  context- 
free  grammar.  Such  a  grammar  consists  of  a  set  of  non- terminals  each  of  which  has  some  rewrite 
rules.  The  rhs  of  a  rewrite  rule  may  be  any  of  these  forms:3 


symbols 
[symbols] 
symbols  * 
symbols  *  lexeme 
symbols  + 
symbols  +  lexeme 
symbols  +  + 
symbols  -t-+  lexeme 


conventional  rhs 
optional 

at  least  0  repetitions  of  symbols 

the  same,  but  repetitions  separated  by  lexeme 

at  least  1  repetition  of  symbols 

the  same,  but  repetitions  separated  by  lexeme 

at  least  2  repetitions  of  symbols 

the  same,  but  repetitions  separated  by  lexeme 


Symbols  is  any  non-empty  sequence  of  grammar  symbols,  although  in  a  conventional  rhs  it  may  be 
empty.  The  optional  operator  is  really  just  a  short  hand  notation:  the  rule  A  ->  [a]  represents  the 
rules  A  ->  a  and  A  -*  e.  Any  rule  containing  one  of  the  repetition  operators  is  called  a  sequence 
rule.  The  lexeme  separating  successive  repetitions  is  called  a  delimiter.  A  warning  is  issued  when 
a  delimiter  is  not  a  constant  lexeme. 


The  “++”  sequence  operator  is  not  standard  usage,  although  the  other  extended  grammar  operators 
are.  The  Ladle  description  of  Ladle  in  Appendix  B  contains  an  example  of  its  use  to  describe 
lexical  concatenation.  The  concatenation  of  several  lexical  expressions  is  properly  represented  as  a 
sequence.  However,  a  single  expression  should  not  be  represented  as  a  sequence  of  one  expression, 
but  simply  as  the  expression  itself.  The  operator  makes  this  distinction  possible. 

Neither  grammar  expansion  nor  LALR(k)  are  defined  on  extended  grammars,  but  both  definitions 
can  be  generalized.  Appendix  C  describes  how  to  normalize  an  extended  grammar  into  a  conven¬ 
tional  one.  Similarly,  with  respect  to  a  given  ip 0  and  Pcyciic,  an  extended  grammar  Q  is  said  to 
be  an  expansion  of  another  extended  grammar  Q  if  and  only  if  the  the  normalization  of  Q  is  an 
expansion  of  the  normalization  of  Q.  In  practical  terms,  this  means  that  the  sequence  rules  of 
the  two  grammars  must  correspond  in  the  same  way  as  other  rules,  with  the  sequence  operators 
considered  lexemes.  An  extended  grammar  Q  is  LALR(k)  if  and  only  if  the  normalization  of  Q  is 
LALR(k). 


3.3  The  Abstract  Section 

The  abstract  section  of  the  language  specification  describes  both  the  language’s  abstract  syntax 
and  the  internal  representation  of  syntax  trees  of  the  language.  The  abstract  syntax  is  described 
as  a  possibly  ambiguous  context-free  grammar  Q.  Each  of  the  definitions  in  the  abstract  section 
defines  a  non-terminal  of  N.  An  abstract-definition  has  the  form: 

■’ll!  the  current  implementation  of  Ladle,  a  non-terminal  which  has  an  optional  or  sequence  rule  may  only  have 
a  single  rule.  This  was  a  poor  design  choice,  as  Ladle  itself  is  best  described  by  violating  this  restriction,  as  in 
Appendix  B. 
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(■ identifier )  “=”  {abstract- rule- seq) 

The  identifier  names  the  non-terminal.  Each  non-terminal  must  have  a  unique  name,  which  must 
not  be  the  name  of  any  lexeme.  The  first  abstract  non-terminal  defined  is  the  start  symbol  S  for 
the  abstract  grammar. 

An  abstract-rule-seq  is  a  sequence  of  abstract  rules,  separated  by  “|”s,  and  containing  at  least  one 
rule.  Each  abstract  rule  follows  this  format: 

(rhs)  “=>”  ( IR-tree-template ) 

A  rhs  is  any  legal  rhs  as  described  in  Section  3.2.  Note  that  only  abstract  non-terminals  and  the 
terminal  lexemes  are  legal  grammar  symbols  in  an  abstract  rhs. 

An  IR-tree-template  describes  how  to  represent  as  a  tree  the  syntax  defined  by  an  abstract  rule.  An 
IR  template  must  provide  three  pieces  of  internal  tree  representation  information.  First,  it  must 
specify  whether  the  abstract  rule  is  represented  in  the  tree  by  a  node,  by  an  annotation,  or  not  at 
all.  Second,  if  the  rule  is  represented  by  a  node,  the  template  can  specify  which  of  the  symbols 
in  the  rhs  of  the  rule  are  to  have  their  sub-tree  representations  be  children  of  this  node.  Finally, 
the  template  may  specify  the  order  in  which  these  children  should  be  represented.  Each  IR  tree 
template  should  match  one  of 

1.  “IMPLICIT” 

2.  “ANNOTATE”  ( child-spec ) 

3.  “ANNOTATE” 

4.  {identifier)  “(”  {child- spec-list)  “)” 

5.  {identifier) 

6.  “TREE” 

or  may  be  left  out,  along  with  the  preceding  “=>”.  If  no  template  is  specified,  a  default  is  assigned. 
Recall  that  some  terminal  lexemes  are  specified  as  preserved  in  the  lexical  section.  All  non-terminals 
are  by  definition  preserved.  The  set  of  preserved  grammar  symbols  is  information  required  by  many 
of  the  tree  templates. 

A  child- spec-list  is  simply  a  sequence  of  child-specs  separated  by  “,”s.  This  sequence  may  be  empty. 
A  child-spec  is  a  grammar  symbol,  optionally  followed  by  an  integer  in  angle  brackets  (‘  <  ,  >  )• 
Each  child-spec  must  refer  to  a  grammar  symbol  in  the  rhs  preceding  it.  A  child-spec-list  must  refer 
to  all  of  the  non-terminals  in  the  rhs  and  each  grammar  symbol  in  the  rhs  may  be  referred  to  at 
most  once.  The  child- spec- list  may  order  the  grammar  symbols  arbitrarily,  however.  The  bracketed 
integer  should  be  included  after  symbols  that  occur  multiple  times  in  the  rhs ,  a  1  refers  to  the  first 
occurrence,  and  so  forth.  A  warning  is  issued  if  an  omitted  lexeme  is  part  of  the  child- spec-list,  or 
if  a  preserved  non-constant  lexeme  is  part  of  the  rhs  but  is  not  in  the  child- spec- list. 

Each  rule  with  tree  template  1  must  have  exactly  one  grammar  symbol  in  its  rhs ,  and  must  not 
be  a  sequence  rule.  The  rule  is  not  represented  explicitly  in  the  tree.  Its  place  is  held  by  the  node 
representing  the  unique  symbol  in  its  rhs. 
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A  rule  that  uses  template  2  is  represented  as  an  annotation  in  the  node  representing  the  symbol 
referred  to  by  the  child-spec  argument.  The  child-specs  in  these  templates  must  be  single  element 
child- spec-lists ;  warnings  will  be  issued  accordingly.  A  rule  using  template  3  must  have  exactly  one 
symbol  in  its  rhs  that  is  preserved,  or  have  only  one  symbol  in  its  rhs  at  all.  This  template  is 
equivalent  to  template  2  with  that  unique  symbol  as  the  argument. 

Tree  template  4  is  the  most  general.  It  specifies  that  the  rule  containing  the  template  is  represented 
by  a  tree  node,  that  the  identifier  is  the  name  for  the  rule  and  node,  and  that  the  children  of  the  node 
are  the  sub-trees  representing  the  grammar  symbols  in  the  child-spec-list,  in  the  order  specified. 
Template  5  is  equivalent  to  the  general  template  with  the  identifier  as  the  node  name  argument 
and  the  preserved  symbols  of  the  preceding  rhs  in  the  order  they  appear  in  the  rhs  as  the  child  list 
argument.  The  name  of  a  node  as  specified  by  either  of  these  templates  must  be  unique,  and  must 
not  be  the  name  of  a  lexeme  or  an  abstract  non-terminal.  An  abstract  non-terminal  definition  with 
only  one  rule  may  use  template  6.  This  template  is  equivalent  to  template  5  with  the  name  of  the 
non-terminal  as  its  node  name  argument. 

If  no  tree  template  is  specified  for  a  rule,  a  default  template  is  assigned  as  follows:  If  the  rule  s  rhs 
is  exactly  one  symbol,  the  default  is  template  1.  If  the  rule’s  rhs  contains  more  than  one  preserved 
symbol,  the  default  is  template  5.  The  name  used  for  the  template  is  “NT-n” ,  where  the  rule  is 
the  nth  rule  for  non-terminal  NT.  Otherwise,  the  default  is  template  3.  A  warning  is  issued  for  the 
latter  two  kinds  of  default  templates. 

Each  sequence  rule  must  be  represented  by  an  actual  tree  node.  Exactly  one  symbol  in  each 
sequence  rule  must  have  its  corresponding  sub-tree  specified  as  a  child  of  the  rule’s  node.  This  is 
so  that  the  sequence  as  a  whole  can  be  represented  as  a  single  node  that  has  as  its  children  exactly 
one  node  for  each  element  of  the  sequence.  Note  that  a  sequence  rule  may  have  no  more  than  one 
non-terminal  in  its  rhs  as  a  consequence.  Further,  no  sequence  delimiter  may  ever  be  explicitly 
represented  in  an  IR  tree. 

The  rule  A  —■ ►  [a]  is  shorthand  for  the  two  riiles  A  — ►  a  and  A—*e.  The  IR.  tree  template  specified 
for  A  ->  [a]  is  assigned  to  rule  A-*a.  Rule  A -*  e  is  given  a  special  tree  template  that  represents  it 
by  a  distinct  node  for  the  rule.  This  node  has  the  name  “EMPTY”,  and  naturally  has  no  children. 

Ladle  defines  the  cyclic  rules  Pcyciic  as  the  set  of  all  abstract  rules  whose  rhs  is  the  rule’s  Ihs  plus 
one  or  more  lexemes  and  whose  IR  template  is  annotate  or  implicit,  with  the  only  first  rule  for 
a  given  non-terminal  included  when  there  is  more  than  one  such  rule.  For  most  languages,  there  is 
at  most  one  such  rule  for  each  abstract  non- terminal. 


3.4  The  Concrete  Section 


The  concrete  section  specifies  an  LALR  parser  for  conversion  of  lexical  streams  to  concrete  deriva¬ 
tions  that  can  in  turn  be  Contracted  to  form  abstract  derivations.  This  parser  is  specified  by  an 
extended  LALR(l)  context-free  grammar  Q  that  must  be  an  expansion  of  the  abstract  grammar  Q. 
The  definition  of  expansion  places  such  strong  constraints  on  the  two  grammars  that  the  concrete 
grammar  is  usually  very  similar  to  the  abstract  grammar.  Therefore,  Ladle  requires  each  abstract 
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non-terminal  to  have  the  same  name  as  the  concrete  non-terminal  to  which  it  corresponds,  that  is, 
Abase  =  N  and  ipQ  n  (A»«  X  N)  =  1#.  Thus,  Q  must  be  an  expansion  of  the  abstract  grammar 
Q  relative  to  the  set  of  cyclic  abstract  rules  PCyclic  specified  in  the  abstract  section  and  the  map¬ 
ping  =  l(£u7V)‘  Further,  since  the  grammars  are  usually  so  similar,  the  concrete  section  of  the 
language  specification  need  not  specify  grammar  rules  for  any  non-terminal  in  Abase  whose  rules 
in  Q  correspond  exactly  to  its  rules  in  Q.  All  non-terminals  in  Abase  are  assumed  by  the  Ladle 
processor  to  have  concrete  rules  that  correspond  to  the  non- terminal’s  abstract  rules.  The  concrete 
section  must  contain  a  set  of  rules  for  each  non-terminal  in  N\ Abase-  The  concrete  section  should 
also  contain  an  explicit  set  of  rules  for  non-terminals  in  Abase  whose  concrete  rules  do  not  exactly 
correspond  to  their  abstract  rules.  Rule  specifications  of  this  latter  kind  override  the  default  set  of 
rules  constructed  from  the  abstract  grammar.  The  concrete  section  consists  of  a  set  of  definitions, 
each  of  which  specifies  the  rules  for  one  concrete  non-terminal. 

A  concrete-definition  has  the  form: 

(identifier)  “=”  (concrete-rule- seq) 

The  identifier  names  the  non-terminal,  and  must  not  be  the  name  of  a  lexeme.  A  non-terminal  may 
be  defined  at  most  once  in  the  concrete  section.  The  concrete-rule-seq  is  a  sequence  of  concrete 
rules,  separated  by  “|”s,  and  containing  at  least  one  rule.  Each  such  rule  must  be  a  legal  rhs 
as  described  in  Section  3.2.  The  set  of  legal  grammar  symbols  for  a  concrete  rhs  consists  of  the 
terminal  lexemes  and  any  non-terminal  defined  in  either  the  abstract  or  concrete  sections. 


4  Output 


The  Ladle  processor  outputs  the  data  needed  to  convert  between  text  and  syntax  trees,  and  to 
manipulate  syntax  trees  directly,  for  a  specified  language.  This  section  describes  that  data  at  a 
fairly  high  level.  The  details  of  the  representation  can  be  found  in  the  implementation.  The  various 
symbols,  rules,  states,  and  so  forth  are  all  represented  in  the  output  by  integers.  The  data  should 
only  be  accessed  indirectly,  using  the  interface  specified  in  Section  5. 

The  output  includes  the  name  of  the  language. 


4.1  Lexical  Data 

A  lexical  automaton  is  output.  This  automaton  is  an  encoding  of  all  the  lexemes’  extended  reg¬ 
ular  expressions.  It  can  be  used  to  extract  the  lexemes  from  a  text  stream.  This  description  is 
intentionally  left  vague,  as  it  is  very  implementation  dependent. 

For  each  lexeme,  its  text,  whether  it  is  ignored,  screened,  or  parsed,  and  its  size  are  output.  The 
text  of  a  constant  lexeme  is  its  expression  string.  For  a  case  insensitive  string,  the  cases  of  the 
characters  in  the  text  are  those  given  in  the  string’s  original  specification.  The  text  of  a  non¬ 
constant  lexeme  is  its  name.  The  size  of  a  constant  lexeme  is  the  length  of  its  string.  Non-constant 
lexemes  are  assigned  a  size  of  0. 
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4.2  Abstract  Syntax  Tree  Data 


For  each  abstract  non-terminal  A  G  N,  its  name,  size,  and  the  set  of  abstract  rules  Iks  ^{A})  are 
output.  The  size  is  the  number  of  characters  in  its  name. 

For  each  abstract  rule  p,  its  name,  arity,  kind,  Ihs,  rhs  length,  rhs,  delimiter,  and  the  IR  permutation 
of  its  rhs  are  output.  The  rhs  of  a  sequence  rule  is  considered  to  be  only  the  left  operand  of  the 
sequence  operator;  the  operator  and  the  delimiter  are  not  included.  Special  values  are  used  for  the 
delimiter  of  non-delimited  sequence  rules  and  non-sequence  rules. 

The  name  and  arity  of  each  abstract  rule  are  both  determined  by  the  IR  tree  template  associated 
with  it.  Some  of  the  grammar  symbols  in  the  rhs  of  each  abstract  rule  are  part  of  the  template 
associated  with  the  rule.  The  arity  of  the  rule  is  the  number  of  such  symbols  in  its  rhs;  All 
sequence  rules  are  thus  assigned  an  arity  of  one.  The  name  of  an  implicit  or  annotate  rule  is 
“IMPLICIT”  or  “ANNOTATE”,  respectively.  The  e  rules  specified  by  the  optional  notation  are 
named  “EMPTY”.  Every  other  abstract  rule  is  named  by  the  IR  tree  template  associated  with  it. 

The  kind  of  each  abstract  rule  is  one  of  the  following: 

implicit 

annotate 

normal 

star 

plus 

plural 

All  rules  with  IMPLICIT  or  ANNOTATE  IR  tree  templates  are  implicit  or  annotate  rules,  respec¬ 
tively.  The  kind  of  a  sequence  rule  with  operator  or  “++”  is  star,  plus,  or  plural, 

respectively.  All  other  abstract  rules  are  of  kind  normal.  Sequence  rules  are  always  represented 
by  explicit  tree  nodes,  so  an  IMPLICIT  or  ANNOTATE  sequence  rule  is  impossible. 

The  IR  permutation  of  the  rhs  of  each  abstract  rule  relates  the  order  of  the  symbols  in  the  rule’s 
rhs  to  the  order  of  the  child  specifications  in  the  ER  tree  template  associated  with  the  rule.  For 
each  child  specification  in  the  rule’s  template  the  rule’s  IR  permutation  gives  the  rhs  index  of  the 
specified  grammar  symbol. 

The  Expand  Top ,  and  Bottom  mappings  are  output. 


4.3  Parsing  and  Unparsing  Data 

Theoretically,  a  lexical  stream  can  be  converted  to  a  syntax  tree  by  parsing  and  contracting.  A 
concrete  derivation  is  constructed  from  a  lexical  stream  by  an  LALR  parser.  Applying  Contract  to 
the  concrete  derivation  yields  an  abstract  derivation,  which  is  then  represented  by  a  syntax  tree. 
Unfortunately,  neither  LALR  parsing  nor  the  Contract  algorithm  given  in  Section  2.4  are  defined  on 
extended  grammars.  To  handle  these  problems,  a  non-extended  LALR  parse  grammar  V  such  that 
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Lang{V )  =  Lang(Q)  is  constructed,  and  the  Contract  algorithm  is  adjusted  to  apply  to  derivations 
of  V  rather  than  Q. 

The  context-free  parse  grammar  V=(N*,Y,,P*,S)  is  a  normalization  of  the  concrete  grammar  Q. 
p •  is  created  from  P  by  normalizing  all  sequence  rules.  N*  is  N  plus  whatever  extra  non- terminals 
are  required  for  the  normalized  rules.  £  and  S  are  the  same  as  in  Q .  For  each  normal  transformation, 
let  A  €  N,  a  6  (S  U  Nm)*  such  that  |a|  ^  0,  and  d  €  £  U  {e}  be  given.  For  each  sequence  rule,  let 
a  distinct  X  $  N  be  given.  The  transformations  are 


A  ->  € 

A—>X 

(abstract) 

A  — ►  a*d 

becomes 

X  -*  a 

X  Xda 

X  -  XdX 

A-*  X 

(abstract) 

(enqueue) 

(append) 

A  — * a+d 

becomes 

X  — ►  G 

X—  Xda 

X  ->  XdX 

A-*  X 

(abstract) 

(enqueue) 

(append) 

A  — ►  a++ d 

becomes 

X  —*  ada 
X-+  Xda 

X  -  XdX 

(plural) 

(plural) 

(plural). 

The  symbol  X  defined  by  each  application  of  a  transformation  is  an  element  of  N*.  X  is  not  a 
distinct  symbol  from  X ,  but  is  a  specification  needed  for  LALR  automaton  generation,  since  the 
parse  grammar  may  be  ambiguous.  These  transformations  are  the  same  as  those  in  Appendix  C 
except  for  the  addition  of  the  rules  containing  X,  which  are  explained  in  the  next  section.  The 
parenthesized  keywords  after  some  of  the  transformation  rules  are  used  by  the  adjusted  Contract 
algorithm  described  shortly. 

Using  the  normalization  given  in  Appendix  B  on  Q  and  Q  yields  the  context-free  grammars  Q' 
and  Q'  such  that  Q'  is  LALR(k)  and  an  expansion  of  Q'.  Unfortunately,  applying  Contract  to  a 
derivation  of  Q'  may  result  in  a  derivation  of  Q'  that  is  different  from  the  appropriate  derivation 
of  Q.  For  example,  a  Q  derivation  tree  represents  each  sequence  with  a  single  rule  node  whose 
children  are  the  sequence  elements,  while  a  Q'  derivation  tree  represents  the  same  sequence  with  a 
left- recursive  binary  tree  whose  leaves  are  the  sequence  elements.  However,  the  sequence  of  leaves 
in  any  binary  tree  can  be  constructed  bottom-up  as  follows: 

For  each  left  leaf  node,  create  the  sequence  containing  the  leaf:  abstract. 

For  each  internal  node  whose  right  child  is  a  leaf  node,  add  the  right  child  leaf  to  the 
sequence  representing  the  left  child  sub-tree’s  leaves:  enqueue. 

For  each  internal  node  whose  right  child  is  another  internal  node,  append  the  sequences 
representing  the  left  and  right  child  sub-trees’  leaves:  append. 

Using  this  method,  the  Contract  algorithm  can  be  extended  to  convert  binary  trees  to  sequences. 
Let  the  action'  and  data'  mappings  for  Q'  and  Q'  be  given.  Note  that  the  rules  of  V  are  a  superset 
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of  the  rules  of  Q' . 

Define  action *  :  P*  -+  {none,  mark,  abstract,  disambiguate,  enqueue,  append,  plural}  by 


Vp€  P*, 


action*  (p)—  \ 


abstract 

enqueue 

append 

plural 

abstract 

none 

action'(p) 


if  p  was  defined  as  abstract 

if  p  was  defined  as  enqueue 

if  p  was  defined  as  append 

if  p  was  defined  as  plural 

if  data(p)  £  II  and  Contract(data(p ))  £  P 

if  data(p)  £  II  and  Contract(data(p ))  £  P 

otherwise 


where  “defined  as”  refers  to  the  keyword  to  the  right  of  the  rule  as  it  was  defined  in  the  normal 
transformations.  For  “++”  sequences  there  is  no  abstract  operation,  and  the  plural  operation 
performs  either  enqueue  or  append  as  appropriate.  Note  that  the  abstract  operation  sometimes 
takes  the  place  of  a  mark  operation.  This  facilitates  splitting  a  concrete  derivation  tree  into  sub¬ 
trees  which  represent  derivations  in  II.  Define  the  data*  mapping  for  V  from  data  by  substituting 
the  original  abstract  sequence  rule  for  each  abstract  rule  produced  by  normalizing,  wherever  such 
rules  appear. 


Under  some  circumstances,  more  optimal 

versions  of  the  concrete  to  parse  grammar  transformations 

are  applicable.  The  transformations 

A  — >  X 

4-*q*  becomes 

X-  e 

(abstract) 

X->  Xa 

(enqueue) 

x-^xl 

(append) 

A  -*  X 

A  ->  a++ d  become 

X  -  a 

(abstract) 

A  — ►  a 

X  -»  Xda 

(plural) 

X  —  XdX 

(plural) 

are  used  in  place  of  the  standard  transformations  whenever  possible.  If  any  of  the  transformations 
results  in  A  -*•  X  being  the  only  rule  for  A,  that  rule  may  be  omitted,  and  .4  used  in  place  of  X. 
This  optimization  is  not  applicable  to  delimited  sequences.  The  action*  and  data  mappings 
must  be  adjusted  for  these  optimizations. 

A  special  LALR  automaton  ACTION  table  is  constructed  for  V.  This  table  differs  from  conventional 
ones  in  that  the  ACTION  mapping  is  defined  not  only  on  terminals,  but  on  non-terminals  as  well. 
So  defined,  this  mapping  subsumes  the  GOTO  mapping  entirely.  With  this  alteration,  the  LALR 
ACTION  table  can  be  used  to  recognize  not  only  strings  of  the  language,  but  also  sentential  forms. 
Note  that  the  inclusion  of  the  X  rules  permits  the  recognition  of  a  non-terminal  representing  a 
sub-sequence  at  any  point  within  a  sequence.  Without  those  rules,  such  non-terminals  could  only 
be  recognized  at  the  beginnings  of  sequences. 

The  presence  of  the  X  rules  in  V  make  that  grammar  ambiguous.  This  ambiguity  is  resolved  in_ 
the  construction  of  the  ACTION  table  by  treating  X  non-terminals  somewhat  like  terminals.  X 
indicates  only  that  the  non-terminal  X  is  acceptable  in  this  position.  It  does  not  indicate  that  an 
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a  such  that  X  a  is  also  acceptable.  Stated  in  the  jargon  of  LALR  parsing,  in  this  context  X 
can  be  shifted,  but  not  obtained  via  a  reduction.  With  this  restriction, jrn  LALR  action  table  can 
be  constructed  for  V  without  changing  the  intended  semantics  of  the  X  rules. 

The  LALR  action  table  is  output.  It  can  be  used  to  parse  a  concrete  phrasal  form  that  derives 
from  any  given  non-terminal.  All  that  is  necessary  is  the  start  state  for  the  non-terminal  and 
a  terminal  symbol  in  the  non-terminal’s  follow  set.  Both  of  these  are  output  for  each  concrete 
non-terminal  in  Atop- 

The  sets  S,  N,  A,  Achain,  Nambig,  and  N  are  output. 

The  mappings  p  n  ( Achain  x  N),  action*,  and  data*  are  output. 

For  each  parse  rule  p  G  P* ,  lhs(p)  and  |r/is(p)|  are  output. 

4.4  Data  Representation 

The  integers  that  represent  the  symbols  and  rules  in  the  abstract  and  parse  grammars  are  carefully 
chosen  to  simplify  Ladle  table  access.  Since  0  n  ( Atop  X  N)  provides  such  a  strong  relationship 
between  the  abstract  non-terminals  in  N  and  the  concrete/parse  non- terminals  in  Atop,  the  cor¬ 
responding  abstract  and  parse  non-terminals  in  those  sets  are  numbered  identically.  The  concrete 
non-terminals  in  A  are  ordered  so  as  to  map  -X  onto  <.  The  abstract  rules  for  each  abstract  non¬ 
terminal  are  ordered  with  all  rules  in  PCyclic  before  all  other  rules.  The  abstract  rules  as  a  whole 
are  ordered  by  the  order  on  their  Z/is’s.  The  concrete  states  IIiUfc  are  numbered  so  that  Vx  G  II,  if 
Contract (x)  G  P,  then  x  is  numbered  identically  to  Contract (x),  otherwise  x  is  numbered  0.  The 
symbols  and  rules  are  indexed  as  follows: 

The  ignored  lexemes  are  not  indexed. 

The  non-constant  screened  lexemes  are  1  through  K  -  1. 

The  constant  screened  lexemes  are  K  through  L  —  1. 

The  constant  preserved  or  omitted  lexemes  are  L  through  V  —  1. 

The  non-constant  preserved  or  omitted  lexemes  are  V  through  5  —  1. 

The  non-terminals  in  N  (or  in  Atop )  are  S  through  R  —  1,  with  5  (or  Top(S ))  first. 

The  non-terminals  in  A\Atop  are  R  through  A  —  1. 

The  non-terminals  in  Ac/,ai„\*4  are  .4  through  C  —  1. 

The  non-terminals  in  Nambig'' Achain  are  C  through  m. 

The  non-terminals  in  N*\Nambig  are  m  +  1  through  n. 

The  abstract  grammar  rules  P  are  R  through  U  —  1. 

The  parse  grammar  rules  P*  are  1  through  r. 

Figure  3  illustrates  these  numeric  assignments.  ( U  may  or  may  not  actually  be  less  than  C,  m,  or  n, 
but  all  of  the  other  variables  are  ordered  correctly.)  Note  that  all  lexemes,  abstract  non- terminals, 
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1 

K 

L 

V 

screened  lexemes 

abstract  lexemes 

var  lexemes 

constant  lexemes 

var  lexemes 

5  R  A 

U 

C  m 

n 

abstract  NTs  (N)  abstract  rules 

A 

abstracted  parse  NTs  (Achain) 

N  ambig 

parse  NTs  (N*) 

Figure  3:  Integer  assignment  for  the  grammars 


and  abstract  rules  are  assigned  distinct  values.  They  are  collectively  called  operators.  Each  node 
of  a  syntax  tree  IR  has  an  operator  that  describes  what  the  node  represents. 

K,  L ,  V ,  5,  R,  A ,  C,  U,  m,  n  and  r  are  all  included  in  Ladle’s  output.  Note  that  the  abstract 
start  symbol  5  and  the  concrete  start  symbol  representative  Top(S)  are  both  numbered  S. 

For  each  abstract  non-terminal,  define  first(X)  as  the  index  of  the  first  rule  with  X  as  its  Ihs. 
Define  first(R)=U,  where  R  and  U  are  the  indices  defined  earlier.  The  set  of_ rules  for  each 
abstract  non-terminal  X  is  thus  represented  by  the  set  of  integers  [first(X),first(X  +  1)[.  first  is 
the  representation  of  these  sets  used  in  the  output. 

Expand  is  represented  by  ( Ihs  o  Expand)  fl(Px  P),  ( rhs  o  Expand )  fl  (P  X  P),  and  ( Contract  o 
Coerce').  For  each  abstract  rule  p  G  P,  rhs(Expand(p))  may  be  parsed  into  lhs(Expand(p))  to  yield 
Expand(p).  Using  (Contract  o  Coerce'),  this  technique  may  be  generalized  from  single  abstract 
rules  to  any  non-null  abstract  derivation. 

For  each  abstract  rule  p  G  P,  a=lhs(Expand(p))  and  A-rhs(Expand(p))  are  output.  Since  if}(A)  = 
lhs(p)  and  ip(a)  =  rhs(p),  A  represents  both  lhs(p)  and  (Ihs  o  Expand)(p),  and  a  represents  both 
rhs(p)  and  (rhs  o  Expand)(p). 

(Contract  o  Coerce')  is  completely  represented  by  the  order  of  the  concrete  non-terminals  and  the 
abstract  rules,  since  -<  is  represented  by  <  and  each  rule  p  G  PCyclic  is  represented  by  first(lhs(p)) . 

xj}  n  ((E  U  Atop)  X  (E  u  N))  is  represented  by  ljv,  so  n  ((S  U  Acham)  X  (E  U  N))  is  represented  in 
the  output  by  xj)  fl  ((Achain^Atop)  X  N)  only.  The  representation  of  x/j  n  (Atop  X  N)  represents  Top 
as  well. 

The  action*  and  data*  mappings  output  are  modified  to  ignore  implicit  abstract  rules  and  null 
abstract  derivations  whenever  possible,  since  the  IR  tree  doesn’t  include  them. 

Bottom  and  (Ihs  o  Expand)  n(PxP)  are  represented  by  a  single  vector  that  includes  lg.  This 
vector  maps  each  operator  X  G  (E  U  N  U  P)  onto  a  parse  symbol  X  that  can  be  derived,  using 
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oulv  trivial  derivations,  from  tlie  Ihs  of  tlie  expansion  of  the  abstract  derivation  represented  by  an\ 
syntax  tree  whose  root  operator  is  X.  For  all  X  €  (SUiVuP),  lhs(Expand(X ))  ^  ^  X ,  since 

for  all  X  G  (S  U  N),  X  is  isomorphic  to  0^ . 

The  names  of  lexemes,  abstract  non-terminals,  and  abstract  rules  are  all  combined  in  a  single 
vector.  The  sizes  of  lexemes  and  abstract  non-terminals  are  gather  together  in  the  same  vector  as 
the  arities  of  abstract  rules. 


5  Client  Interface 


The  Ladle  client  interface  provides  a  simple  means  of  accessing  the  output  tables.  In  the  interface, 
the  names  symbol ,  non-terminal ,  NT,  and  rule  refer  to  the  abstract  grammar,  unless  otherwise 
noted.  Grammar  symbols,  rules,  states,  and  so  forth  are  represented  by  integers.  No  distinction  is 
made  between  the  integer  and  what  it  represents. 


5.1  Language  Forms 


(load-language  language-name ) 

Load  and  return  the  Ladle  tables  for  the  language  named  language-name.  There  may  be 
optional  extra  parameters  to  this  function  that  specify  system  dependent  load  arguments. 

(with-language  language 
body ) 

Make  language  the  current  language  while  executing  body.  Many  of  the  other  forms  and 
functions  use  the  current  language. 


(current -language) 

Return  the  current  language,  or  false  if  there  is  none. 
( 1 anguage - name ) 

Return  the  current  language’s  name. 


(language- top-operator) 

Return  the  abstract  and  parse  grammar  start  symbol  5. 

(do-operators  ( variable  return- form) 
body) 

For  each  operator  in  the  language,  bind  variable  to  the  operator  and  execute  body.  The 
return  function  may  be  used  in  the  body  as  in  all  loops.  Return  the  result  of  return- form.. 
or  true  if  no  return- form  is  given. 


(do-lexemes  ( variable  return-form) 
body) 
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For  each  lexeme  in  the  language,  bind  variable  to  the  lexeme’s  operator  and  execute  body.  The 
return  function  may  be  used  in  the  body  as  in  all  loops.  Return  the  result  of  return- form, 
or  true  if  no  return- form  is  given. 

(do-NTs  ( variable  return- form ) 

For  each  abstract  non-terminal,  bind  variable  to  the  non-terminal’s  operator  and  execute 
body.  The  return  function  may  be  used  in  the  body  as  in  all  loops.  Return  the  result  of 
return-form,  or  true  if  no  return-form  is  given. 


5.2  Lexer  Data  Forms 


initial-lex-state 

The  initial  lex  state  and  first  argument  to  lex- act  ion. 

(lex-action  lex-state  character ) 

Return  the  lex  action  for  the  character  when  in  the  lex-state:  False  if  an  ignored  lexeme 
has  been  recognized,  a  lexeme  if  one  that  is  not  ignored  has  been  recognized,  the  new  state 
otherwise.  The  lexeme  may  be  lexical-error-operator  or  eos-lexeme-operator.  Note 
that  a  lexeme  only  specifies  the  lexical  class;  the  characters  recognized  are  not  preserved  here. 


5.3  Parser  Data  Forms 


(initial-state-for-parse-NT  parse-NT) 

The  initial  parse  state  and  first  argument  to  parse-action  when  parsing  a  phrasal  form 
derived  from  the  parse-NT.  It  is  an  error  for  parse-NT  £  Atop- 

(parse-action  parse-state  parse-symbol) 

Return  the  LALR  parse  action  for  the  parse-symbol  when  in  the  parse-state:  a  new  state 
for  shift,  a  parse  rule  for  reduce,  or  false  for  error. 

(follow-lexeme-f or-parse-NT  parse-NT) 

Return  any  lexeme  in  the  follow  set  of  the  parse-NT.  It  is  an  error  for  parse-NT  Atop. 

(parse-rule-lhs  parse-rule) 

Return  the  parse  non-terminal  lhs( parse-rule). 

(parse-rule-length  parse-rule) 

Return  the  length  of  the  parse-rule. 

(parse-rule-action  parse-rule) 

Return  action* (parse-rule) . 

(parse-rule-data  parse-rule) 
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Return  data* (parse-rule) . 


(disambiguate-parse-action  parse-rule  parse-rule-rhs-state-sequence) 

Return  the  disambiguated  action  for  the  parse-rule:  a  Contract  state  for  mark  or  an  abstract 
rule  for  abstract.  Parse-rule-rhs-state-sequence  contains  the  Contract  states  corresponding 
to  each  symbol  in  the  parse-rule' s  rhs. 


(parse-symbol-operator  parse-symbol) 

Return  ip(parse- symbol).  Parse-symbol  must  be  an  element  of  (S  U  Achain)- 


(operator-parse-symbol  operator) 

Return  lhs(Expand(operator )),  where  for  all  X  £  (E  U  N),  X  is  isomorphic  to  0^. 


(list-parse-NT-to-NT-rule-cycle  from-parse-NT  to-parse-NT) 

Return  the  derivation  Contract(  Coerce1  (from-parse-NT,  to-parse-NT))  as  a  list  of  abstract 
rules.  This  list  may  be  destructively  modified.  It  is  an  error  if  from-parse-NT  or  to-parse-NT 
is  not  in  A,  or  if  ip  (from- parse- NT)  ^  ip(to-parse-NT). 


5.4  Generic  Operator  Forms 


syntactic-error-operator 

lexical-error-operator 

eos-lexeme-operator 

Each  syntactic  or  lexical  error  is  represented  by  the  syntactic-error-operator  or  the 
lexical-error-operator,  respectively.  Eos-lexeme-operator  represents  the  end  of  the 
lexical  stream.  It  is  returned  by  lex-action  and  expected  by  lalr-action.  These  operators 
are  collectively  called  the  special  operators. 

(find-named-operator  operator-name) 

Return  the  integer  assigned  to  the  operator  named  operator-name ,  or  false  if  there  is  no  such 
operator. 

(operator-text  operator) 

Return  the  text  for  operator.  It  is  an  error  for  operator  to  be  special.  If  operator  is  a  constant 
lexeme,  the  lexeme  itself  is  returned.  Otherwise,  the  name  of  the  lexeme,  non-terminal,  or 
rule  is  returned. 

(operator- is-special?  operator) 

Return  true  if  operator  is  special,  false  otherwise. 

(operator-is-constant?  operator) 

Return  false  if  operator  is  a  non-constant  lexeme,  true  otherwise. 

(operator-is-screened?  operator) 

Return  true  if  operator  is  a  screened  lexeme,  false  otherwise. 
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(operator-is-lexeme?  operator ) 

Return  true  if  operator  is  a  lexeme  or  is  special,  false  otherwise. 


(operator- is-symbol?  operator ) 

Return  false  if  operator  is  a  rule,  true  otherwise. 

(operator-is-variable-arity?  operator) 

If  operator  is  a  delimited  sequence,  return  the  sequence’s  delimiter.  If  operator  is  the 
syntactic-error-operator  or  an  undelimited  sequence,  return  syntactic-error-operator. 
If  the  arity  of  operator  is  fixed  (possibly  zero),  return  nil. 


5.5  Symbol  Forms 


(symbol-size  symbol-operator) 

For  a  non-terminal,  return  the  number  of  characters  in  its  name.  For  a  constant  lexeme, 
return  the  number  of  characters  in  its  representation.  For  non-constant  lexemes,  return  0.  It 
is  an  error  if  symbol- operator  is  a  special  operator. 

(do-NT-rules  (.variable  NT-operator  return-form) 

b°For  each  abstract  rule  p  for  which  lhs(p)  =  NT-operator,  bind  variable  to  the  rule’s  operator 
and  execute  body.  The  return  function  may  be  used  in  the  body  as  in  all  loops.  Return  the 
result  of  return- foirni,  or  true  if  no  return- form  is  given. 

(NT-bottom  NT-operator) 

Return  Bottom(NT-operator). 


5.6  Rule  Forms 


(rule-kind  rule- operator) 

Return  the  kind  of  abstract  rule  rule-operator. 


(rule-lhs  rule-operator) 

Return  lhs(Expand(p)),  where  ip(lhs(Expand(p)))  =  lhs(p)  and  abstract  rule  p  is  represente 
by  rule-operator. 


(rule-arity  rule-operator) 

Return  the  arity  of  abstract  rule  rule- operator. 

(do-rule-rhs-operators-and-child-indices  (( symbol-var  child-var)  ride-operator  return-form) 

Let  a=rhs(Expand(p)),  where  if  (a)  —  rhs(p)  and  p  is  represented  by  rule-operator.  For  each 
symbol  in  a,  execute  body  with  symbol-var  bound  to  the  symbol’s  operator  and  child-var 
bound  to  the  index  of  the 
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corresponding  child  specification  in  the  rule’s  IR  tree  template,  or  -1  if  there  is  none.  The 
return  function  may  be  used  in  the  body  as  in  all  loops.  Return  the  result  of  return-form , 
or  true  if  no  return-form  is  given. 
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A  Notation  and  Conventions 

A.l  Notation 

Let  S  and  T  be  sets. 

Define  S\T  as  the  set  difference  of  5  and  T. 

Define  Is  :  S  ->  5  as  the  identity  mapping  on  S. 

Let  5*  denote  the  application  of  the  Kleene  star  operator  to  5 . 
Define  Subsets(S)={S'  \  S'  C  5}. 

Use  the  notation  3lx  . . .  to  denote  “there  exists  a  unique  x  such  that 
Let  /  :  D  — *  C  be  a  mapping. 

If  /  is  invertible,  define  /-1  :  C  -*■  D  as  the  /  inverse  mapping. 
Define  the  /  image  mapping  /  :  Subsets(D)  —>  Subsets(C)  by 

VS  CD,  f(S)={f(d)\deS}. 

Define  the  /  preimage  mapping  /-1  :  Subsets(C)  — >  Subsets(D)  by 

VS  C  C,  f-\S)={d\f(d)  6  S}. 

Let  the  subsets  D'  C  D  and  C'  C  C  be  given  such  that  f(D')  C  C' . 
Define  the  restriction  mapping  /n(U'xC')  by 

Vd  €  D',  /  n  (D'  X  C')(d)=/(d). 


The  empty  string  is  represented  by  e. 

Let  /  :  D  -*  C  be  a  mapping,  where  D  and  C  are  sets  of  symbol  strings  and  e  £  D. 
Extend  /  homomorphically  to  the  mapping  f  :  Dm  ->  C*  by 

/(a/J)=/(a)/(/3)  and  /(«)=e. 


A  context-free  grammar  Q  is  a  tuple  (N ,  E,  P,  S),  where  JV  is  the  set  of  non-terminal  symbols,  S  is 
the  set  of  terminal  symbols,  P  is  a  set  of  rewrite  rules,  and  S  £  N  is  the  initial  symbol. 

A  rewrite  rule  in  P  is  written  (A  -*  a),  where  .4  €  N  and  a  6  (E  U  N)*. 

A  is  the  left-hand- side  or  Ihs,  and  a  is  the  right-hand- side  or  rhs. 

A  chain  rule  is  any  rule  of  the  form  (A  — *  B ),  where  B  €  N .  An  empty  rule  is  any  rule  of  the  form 
(A  ->  e). 

For  any  M  C  N,  define  P-,m  C  P  by  P^ a/={(A  — >  a)  €  P\A  £  M}. 

a  A  (3  denotes  a  derivation  that  rewrites  a  G  (S  U  N)*  as  (3  £  (S  U  N)*  by  applying  a  sequence  tt 
p> 

of  rewrite  rules  in  P1  C  P. 
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In  general,  a  derivation  may  be  of  any  length,  including  zero. 

0  i 

For  any  grammar  symbol  A  6  (S  U  IV),  define  0^  as  the  zero  length  derivation  for  which  A  =$  A. 
A  zero  length  derivation  is  also  called  a  null  derivation. 
a  =>  (3  denotes  a  derivation  consisting  of  exactly  one  rule. 
a  =>•  (3  denotes  a  derivation  without  naming  it. 

P'  may  actually  be  a  set  of  derivations  rather  than  of  rules.  If  P'  is  not  specified,  P  is  assumed. 
When  a  4-  f3  and  a  €  N,  (3  is  called  a  phrasal  form.  When  a  =  S,  (3  is  called  a  sentential  form. 


Let  (a  =£  f3)  and  (8  =£•  7)  be  derivations  such  that  (3  =  £6 £,  for  some  a,/3,<5, 7^, C  G  (S  U  N)*. 

Then  (a  =£  £7£)  is  a  concatenation  of  7r0  and  xt. 

Note  that  there  may  be  many  ways  to  concatenate  two  derivations. 


Let  n0  and  IIi  be  sets  of  derivations. 

Define  nolli  as  the  set  of  derivations  that  can  be  constructed  by  concatenating  a  derivation  in  II 0 
with  a  derivation  in  II 1. 

Define  115  35  set  of  derivations  that  can  be  constructed  by  concatenating  zero  or  more  derivations 
in  n0. 

Define  IlJ  as  the  set  of  derivations  that  can  be  constructed  by  concatenating  one  or  more  derivations 

in  n0. 


The  mappings  Ihs  :  P*  —>  N  and  rhs  :  P*  — ►  (S  U  N)*  are  defined  by 
Vi  6  ?*,  where  A  a,  A  €  N,  and  a  6  (S  U  N)  , 


lhs(n)—A 

rhs(ir)—a. 


Let  1  6  P*  be  a  derivation  and  P'  C  P  be  a  set  of  rules. 

Define  the  derivation  n\P'  as  the  derivation  x  with  all  of  the  rules  in  P'  removed. 

Care  must  be  exercised  with  this  operation,  as  x\P'  is  not  always  actually  a  derivation. 

For  any  grammar  Q  =  ( N ,  S,  P,  5),  Lang(Q)  is  the  set  of  terminal  strings  that  can  be  derived  from 
5  using  the  rules  in  P. 

A. 2  Conventions 

Q  is  a  context-free  grammar. 

A,  B ,  C,  and  A'  are  non- terminals. 
p  and  q  are  rewrite  rules. 

a ,  /?,  7,  8,  £,  and  £  are  strings  of  terminals  and  non-terminals, 
x  is  a  derivation. 
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J_  is  a  symbol  such  that  1  ^  (S  U  IV). 

All  symbols  relating  to  abstract  grammars  have  hats  on  them;  e.g.  A,  p,  a,  i r,  and.  so  forth 
Conversely,  no  symbol  that  does  not  relate  to  abstract  grammars  has  a  hat  on  it. 
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B  Ladle  in  Ladle 


LANGUAGE  ladle 

/*  A  Ladle  description  of  the  Ladle  input  syntax.  */ 
LEXICAL 

whitespace  =  {  \t\n\“L>  =>  IGNORE; 
comment  =  "/*"  ”  "*/"  =>  SCREEN; 


string  =  "  "V"  5 

case_insensitive_string  » 
char.set  =  -  "}"  ; 

index  =  "<"  {0-9}+  ">"  ; 


identifier  = 

key.abstract 

key_annotate 

key.concrete 

key_ignore 

key_implicit 

key_in 

key.language 

key_lexical 

key.omit 

key_preserve 

key_screen 

key.tree 


{a-zA-Z}({_0- 

=  ’ABSTRACT’ 
=  ’ANNOTATE’ 
=  ’CONCRETE’ 
=  ’IGNORE’ 

=  ’IMPLICIT’ 
=  ’IN’ 

=  ’LANGUAGE’ 
=  ’LEXICAL’ 

=  ’OMIT’ 

=  ’PRESERVE’ 
=  ’SCREEN’ 

=  ’TREE’ 


«1  >  t«  _  II  3  II 


9a-zA-Z}*) ; 

IN  identifier; 
IN  identifier; 
IN  identifier; 
IN  identifier; 
IN  identifier; 
IN  identifier; 
IN  identifier; 
IN  identifier; 
IN  identifier; 
IN  identifier; 
IN  identifier; 
IN  identifier; 


ABSTRACT 

ladle.specif ication  =  ’LANGUAGE’  identifier 

’LEXICAL’  lexical.def inition.seq 
’ ABSTRACT’  abstract.def inition.seq 
opt_concrete_section  =>  TREE; 

opt_concrete_section  =  [’CONCRETE’  concrete.def inition.seq]  ; 
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lexical.def inition.seq  =  lexical. definition*  =>  TREE; 


lexical.def inition  =  identifier  "=" 
lexical.expr  lexical.disposition  =>  TREE; 


lexical.expr  = 


I 


I 

I 

I 


lexical.expr  ++  "I" 
lexical.expr  ++ 
lexical.expr  string 
string  string 
any.string  ’IN’  identifier 
lexical.expr  "*" 
lexical.expr  "+" 

"["  lexical.expr  "]" 

"("  lexical.expr  ")" 
string 

case. insensitive. string 


=>  or 

=>  concatenate 
=>  match 

=>  balanced.match 
=>  lex.in 
=>  lex.star 
=>  lex.plus 
=>  lex.optional 
=>  ANNOTATE 


I  char.set 


any.string  =  string 

|  case.insensitive.string 

* 

>  l.default.disposition 

>  1. ignore 

>  l.screen 

>  l.omit 

>  l.preserve 


lexical.disposition  -  - 

|  ’IGNORE’ 

|  "•>"  ’SCREEN’ 

|  "=>"  ’OMIT’ 

|  "=>"  ’PRESERVE’  = 


rhs 


grammar.symbol.seq  "*"  opt .grammar. symbol  => 
grammar.symbol.seq  "+"  opt.grammar.symbol  -> 
grammar.symbol.seq  "++"  opt.grammar.symbol  => 
" ["  grammar.symbol.seq  "]"  => 
grammar.symbol.seq 


star 

plus 

pluraQ. 

optional 


grammar. symbo] _ seq  =  grammar. symbol*  =>  TREE; 

opt.grammar.symbol  =  [  grammar.symbol  ] 


grammar.symbol  =  string 

|  case.insensitive.string 
I  identifier 
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abstract.def inition_seq  =  abstract.def inition+  =>  TREE; 
abstract.def inition  =  identifier  abstract.rule.seq  =>  TREE; 


abstract_rule_seq  =  abstract.rule  +  "I"  =>  TREE; 


abstract.rule  =  rhs  tree.template  =>  TREE; 


tree_template 


>"  ’IMPLICIT’ 

>"  ’ANNOTATE’  abstract.child 
>"  ’ANNOTATE’ 

>"  identifier 
>"  identifier 
>"  ’TREE’ 


=>  default.template 
=>  implicit.template 
=>  annotate_template 
=>  annotate_def ault_template 
=>  general_template 


("  abstract_child_seq  ") 

=>  named_template 
=>  simple_template 


abstract. child. seq  =  abstract.child  *  =>  TREE; 
abstract.child  =  granunar.symbol  opt.index  =>  TREE; 
opt.index  =  [  index  ]  =>  TREE; 


concrete.def inition_seq  =  concrete.def inition*  =>  TREE; 
concrete.def inition  =  identifier  "="  concrete.rule.seq  =>  TREE; 
concrete.rule.seq  =  rhs  +  "I"  =>  TREE; 


CONCRETE 

lexical.expr  =  lexical.term.seq  ++  "I" 
I  lexical.term.seq 


lexical_term_seq  =  lexical_term++ 


lexical.term  =  lexical.f actor  string 
I  string  """  string 
I  any.string  ’IN’  identifier 
I  lexical.factor 
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lexical.f actor  =  lexical.primary 

I  lexical.primary  "+" 
|  lexical.primary 


lexical.primary  =  "("  lexical.expr  ")" 

|  " ["  lexical.expr  "] " 

|  string 

I  case.insensitive.string 
I  char. set 


45 


C  Normalizing  Extended  Context-Free  Grammars 


The  theory  of  expansion  as  described  in  Section  2  does  not  include  the  special  sequence  operators, 
defined  in  Section  3.2.  A  context-free  grammar  containing  these  operators  can  be  converted  into 
an  equivalent  context-free  grammar  without  them  by  normalizing  each  of  the  its  sequence  rules. 
For  each  normal  transformation,  let  A  6  N,  a  6  (Su  IV)*  such  that  |a|  ^  0,  and  d  6  S  U  {e}  be 
given.  For  each  sequence  rule,  let  a  distinct  X  E  be  given.  The  transformations  are: 

A  — >  e 

A  — *  a*d  becomes  A  — »  X 

X  ->  a 
X  ->  Xda 

A  X 

A  — *•  a+d  becomes  X  — ►  a 

X  ->  Xda 

A-  X 

A  —*■  a++d  becomes  X  — >  ada 

X  -*  Xda 

Each  new  concrete  non-terminal  X  defined  by  transforming  a  rule  of  a  concrete  grammar  is  mapped 
by  Tp0  onto  the  new  abstract  non-terminal  X  defined  by  transforming  the  similar  rule  in  the  corre¬ 
sponding  abstract  grammar. 
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